This is an archive of the discontinued LLVM Phabricator instance.

Merge manifest namespaces.
AcceptedPublic

Authored by ecbeckmann on Aug 1 2017, 5:19 PM.

Details

Reviewers
ruiu
rnk
zturner
Summary

mt.exe performs a tree merge where certain element nodes are combined into one. This introduces the possibility of xml namespaces conflicting with each other. The original mt.exe has a hierarchy whereby certain namespace names can override others, and nodes that would then end up in ambigious namespaces have their namespaces explicitly defined. This namespace handles this merging process.

Event Timeline

ecbeckmann created this revision.Aug 1 2017, 5:19 PM
ecbeckmann updated this revision to Diff 109264.Aug 1 2017, 7:31 PM

Added comments.

ruiu edited edge metadata.Aug 1 2017, 10:14 PM

Before looking into details, there are a few things I want to say.

  • This is a large patch but the commit message is too terse. Please write it to describe what this patch does.
  • Now you have a lot of functions that are individually guarded by #ifdef's, which is somewhat error-prone. It is probably better to have one larger #ifdef and put all functions that are called only when libxml is available into it.

Updated commit message.

ecbeckmann edited the summary of this revision. (Show Details)Aug 2 2017, 11:42 AM

Put all libxml-only functions in a single ifdef.

rnk added inline comments.Aug 2 2017, 11:53 AM
llvm/include/llvm/WindowsManifest/WindowsManifestMerger.h
38

This will probably have internal linkage and get duplicated into every TU that includes this header. Can you sink this down into the .cpp file?

This is also just static data, so consider making this an array like this:

static const std::pair<const char *, const char *> MtNsHrefsPrefixes[] = { .... };

That should avoid some dynamic initialization, while still letting you loop over it and make StringRefs: https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors

llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
64

I feel like this would be simpler as:

// If either pointer is null, use pointer equality. If both are null, they are equal, otherwise they are unequal.
if (!A || !B)
  return A == B;
69

Right now, these functions are externally visible symbols in the global namespace. We should take some steps to hide them from users of LLVM. Any free function that you don't need outside this file should be static: https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

121

Echoing what @ruiu said, I was imagining that we'd have one large ifdef between stub implementations of WindowsManifest::merge etc, and the real implementations and all of the helpers you wrote here that call libxml2.

Fix formatting errors caused by removing ifdefs.

ecbeckmann updated this revision to Diff 109404.Aug 2 2017, 1:15 PM
ecbeckmann marked 4 inline comments as done.

Avoid static constructor, remove symbols from global namespace, separate real and stub implementations, simplify string compare.

rnk added inline comments.Aug 2 2017, 1:17 PM
llvm/include/llvm/WindowsManifest/WindowsManifestMerger.h
58–59

Perhaps we should hide the implementation details here with the pimpl pattern: http://en.cppreference.com/w/cpp/language/pimpl

Then we don't need to include xmlreader.h above behind ifdefs in the header, and clients will not have to worry about possible name or macro conflicts.

The "Impl" class would store the vector of merged XML documents.

ecbeckmann updated this revision to Diff 109429.Aug 2 2017, 2:05 PM

Use PImpl for WindowsManifestMerger.

ecbeckmann marked an inline comment as done.Aug 2 2017, 2:06 PM
ecbeckmann updated this revision to Diff 109433.Aug 2 2017, 2:34 PM

Added newline to manifest

ruiu added inline comments.Aug 3 2017, 2:00 AM
llvm/include/llvm/WindowsManifest/WindowsManifestMerger.h
49–51

Usually we write private members after public members. Consistency matter, so please make new code consistent with existing code in general.

llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
79–80

Please run clang-format every time you upload a new patch to phab.

272–279

No else after return (apply this rule to other places in this file.)

702

clang-format.

zturner edited edge metadata.Aug 3 2017, 11:31 AM

Mostly style issues. I'd like to see all the char pointers go away in favor of StringRef, unless there's a strong reason why this won't work.

llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
53–60

Why not have this be a vector of pairs of StringRef? We're not modifying anything, and these will be in constant storage, so I think StringRef is fine? Maybe even a std::map<StringRef, StringRef>, which can still use initializer list syntax.

69

It would be nice if we could deal entirely with StringRef and do conversions at the highest level possible so we never have to deal with this stuff.

StringRef toStringRef(const unsigned char *Str) {
  return StringRef(FROM_XML_CHAR(Str));
}

static bool isMergeableElement(StringRef S) {
  constexpr StringRef KnownList[] = { ... };
  return llvm::is_contained(KnownList, S);
}

then the above xmlStringsEqual() also just becomes toStringRef(X) == toStringRef(Y), which would most likely just turn into X == Y because you'll be getting rid of the pointer-ness at the highest level and everything will actually be dealing with StringRef behind the scenes.

90–91

For a concrete example of the previous suggestion:

static xmlAttrPtr getAttribute(xmlNodePtr Node, StringRef AttributeName) {
  for (auto Attribute = Node->properties; Attribute; Attribute = Attribute->next) {
    if (toStringRef(Attribute->Name) == AttributeName)
      return true;
  }
  return false;
}
102–103

namespaceOverrides(StringRef HRef1, StringRef HRef2)

111–115

You can use the range-based llvm::find_if(MtNsHrefsPrefixes, [HRef2]...); Alternatively, if my initial suggestion of making it a map works, you could just write: auto HRef2Position = MtNsHrefsPrefixes.find(HRef2);

ecbeckmann marked 4 inline comments as done.Aug 7 2017, 12:59 PM

Mostly style issues. I'd like to see all the char pointers go away in favor of StringRef, unless there's a strong reason why this won't work.

I agree changing to StringRef would be good for consistency. However there are a couple reasons I see not to. One is just simply the string type libxml2 uses is unsigned char *, so keeping as such whenever possible reduces the number of conversions. Second, my merging implementation allows nullptr to be a valid string value, and is used to designate the prefix of a default xml namespace, and libxml also uses this representation. Therefore we have things like making nullptr comparisons, which StringRef explicitly disallows.

llvm/include/llvm/WindowsManifest/WindowsManifestMerger.h
49–51

Sorry, this was leftover from a previous version where I had inline forwarding calls in the header, so I needed the implementation class defined prior.

llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
79–80

I had run clang-format before uploading, however I believe the problem is I was diffing against the wrong previous revision so not all my changes were covered.

272–279

also changed 2 other places

ecbeckmann updated this revision to Diff 110073.Aug 7 2017, 1:27 PM
ecbeckmann marked 5 inline comments as done.

Fix formatting, remove else after return, switched array for map

ruiu added a comment.Aug 15 2017, 1:27 PM

Publishing comments about coding style first. I'll do real code review later.

llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
82–85

I think I wrote this before, but I believe we usually avoid this style

for (...)
  if (...) {
  }

You want to either use this style

for (...) {
  if (...) {
  }
}

or

for (...)
  if (...)
    ....
283–285
if (foo)
  return true;
return false;

is equivalent to

return foo;
293–294

else after return

rnk added inline comments.Aug 15 2017, 1:34 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
293–294

This can be just return hasInheritedNs(Node) && Node->ns->prefix == nullptr;

299

ditto, you can return the condition.

ruiu added inline comments.Aug 15 2017, 1:48 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
110–112

Since HRef1Position and HRef2Position are iterators for the same map, you can compare them directly, can't you?

ruiu added inline comments.Aug 15 2017, 1:48 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
132–136

Sorry for picking up styles, but even in the same patch, you are using two styles. Here, you are doing

for (...) {
  if (...) {
  }
}

but in the above function you did

for (...) {
  if (...)
    ...
}

Consistency matters because it helps people who read your code, so please pay a bit more attention to consistency. It is also a good practice to read your code yourself several times after you think you "finish" your patch, as there's always something you can improve as a finishing touch. It will shorten the time for code review as (1) you will unlikely to get nitpicking comments on style and (2) consistent coding style makes code reviewers more confident that you know what you are doing.

146–152

Maybe a better way of writing this is to use assignments in ifs.

if (xmlNsPtr Def = search(HRef, Node))
  return Def;
if (xmlNsPtr  Def = xmlNewNs(Node, HRef, getPrefixForHref(HRef)))
  return Def;
return make_error<WindowsManifestError>("failed to create new namespace");
173

Remove this blank line.

218

It's good to do continue here, as early continues are preferred.

235

Ditto

255–256

Ditto

Only 1 nitpick, I think reid and rui covered the rest pretty well.

llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
598–599

!llvm::is_contained(RequiredPrefixes, Def) will allow you to get this all on a single line. Similar applies to other places in the file.

zturner added inline comments.Aug 15 2017, 2:43 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
598–599

Also, invert the conditional first and then put a continue.

if (!Def->prefix || llvm::is_contained(RequiredPrefixes, Def)) {
  Prev = Def;
  continue;
}

...

This keeps the indentation down and makes it easier to read.

ecbeckmann marked 9 inline comments as done.

Style consitency updates.

ecbeckmann added inline comments.Aug 15 2017, 2:48 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
82–85

Ah okay....I think I've heard other variations of a similar principle, but not this exact style rule. I'll change this everywhere.

110–112

the only way to directly compare iterators is using std::distance. However, behavior is undefined if the 2nd iterator is not reachable by incrementing the first iterator, which will often be the case for our usage. Therefore, the only way to correctly compare is to first find each one's absolute position in the map, which I do here.

132–136

I understand, consistency is important. I have been reading the code myself before sending it out for review, but I guess there are certain areas I need to focus on more.

zturner added inline comments.Aug 15 2017, 2:51 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
110–112

Maps are ordered, so why not just use the ordering function directly?

return FROM_XML_CHAR(HRef1) > FROM_XML_CHAR(HRef2)?
ecbeckmann marked 2 inline comments as done.

use llvm::is_contained

ecbeckmann added inline comments.Aug 15 2017, 3:26 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
110–112

Unfortunately HRef1 and HRef2 do not point within the table, but rather xmlChar arrays allocated by libxml2's parser, so we can't compare those.

zturner added inline comments.Aug 15 2017, 3:51 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
110–112

I think I'm still mis-understanding something.

The code here is looking both entries up by key, then after determining that both keys are present, it then computes each iterator's distance from the end. And if the first one is farther from the end (e.g. closer to the beginning) than the second one, it returns true, otherwise it returns false.

But when you iterate a map, it iterates in key order. So saying "the first one is closer to the beginning" is identical to saying "the first one is less than the second one". So as long as you use the same comparison function that the map uses internally to order keys, then it should be the same. And in this case, the map seems to be using a string comparison.

So it seems like all you have to do is string compare the two arguments and return true if the first one is less than the second one?

ecbeckmann added inline comments.Aug 15 2017, 4:20 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
110–112

I'm no longer using a map, but rather a vector of pairs, therefore the keys stay in the same order as they were put in the definition. No comparison operator is used on the StringRefs. We purely want to find the index in the vector and compare them. So using StringRef / string comparison will not work here.

zturner added inline comments.Aug 15 2017, 4:27 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
110–112

Ahh, I see. I didn't realize we were no longer using a map. In that case, why doesn't return Iter1 < Iter2; work? vector::iterator is a random accessor iterator, which supports relational operators.

ecbeckmann added inline comments.Aug 15 2017, 4:39 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
110–112

find() in this case returns type "std::_Rb_tree_const_iterator" which does not support comparison. I think it is because it is a vector of pairs, so this iterator is used? I feel like a complicated template specification is required to get it to return vector::iterator

zturner added inline comments.Aug 15 2017, 4:43 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
110–112

Something doesn't seem right. If it's std::_Rb_tree_const_iterator, then this can't possibly be a vector of pairs, it's a std::map<StringRef, StringRef>.

ecbeckmann added inline comments.Aug 15 2017, 4:56 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
110–112

Ah shoot my apologies, I was looking at the wrong revision. It indeed uses a map. Actually, this seems to be a bug I inadvertantly created, because I assume the sort will put it in alphabetical order, when really it should stay in the same order. I will change to a vector and use position comparison on iterators.

Changed to vector of StringRefs so that no reordering happens.

ecbeckmann marked 2 inline comments as done.Aug 15 2017, 5:05 PM

Add test case to cover incorrect namespace prioritization.

ruiu added inline comments.Aug 16 2017, 3:24 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
102

Overall, you can simplify this function as

auto It1 = llvm::find_if(
    MtNsHrefsPrefixes,
    [=](const std::pair<StringRef, StringRef> &element) {
      return xmlStringsEqual(HRef1, TO_XML_CHAR(element.first.data()));
    });
auto It2 = llvm::find_if(
    MtNsHrefsPrefixes,
    [=](const std::pair<StringRef, StringRef> &element) {
      return xmlStringsEqual(HRef2, TO_XML_CHAR(element.first.data()));
    });
return It1 < It2;
106

Coding style issue: element -> Element

113

Instead of explicitly capturing variables, use [=] or [&].

element -> Element

116–118

I believe you can just remove

if (HRef2Position == std::end(MtNsHrefsPrefixes))
  return true;

because if HRef2Position is std::end(MtNsHrefsPrefixes), expression HRef1Position < HRef2Position is always true in this context.

218

It's not done.

255–256

Not done.

ecbeckmann marked 4 inline comments as done.

Shorten overriding namespace function.

ecbeckmann added inline comments.Aug 16 2017, 5:34 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
218

are you referring to the if block or the else if block? Because on this comment's revision it refers to the else if and I have added a continue since then.

255–256

same

Remove extra line.

ruiu added inline comments.Aug 16 2017, 5:56 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
218

What I was saying is that, if you have this code

for (...) {
  if (...) {
    if (...) {
      ...
    } else {
      ...
    }
  } else if (...) {
    ...
  } else if (...) {
    ...
  }
}

then you can rewrite it with early continues like this:

for (...) {
  if (...) {
    if (...) {
      ...
      continue;
    }
    ...
    continue;
  }

  if (...) {
    ...
    continue;
  }

  ...
}

Add last early continue.

ecbeckmann added inline comments.Aug 16 2017, 6:18 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
218

ah okay. As of the most recent revision this criteria should be fulfilled.

ruiu added inline comments.Aug 16 2017, 6:21 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
221

Well, you want to remove continue, as I described in the previous comment. (Please fix not only this place but other places as well.)

ecbeckmann added inline comments.Aug 16 2017, 6:46 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
221

wait I'm confused, I thought we want to add the early continues?

ruiu added inline comments.Aug 16 2017, 10:04 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
221

Ah sorry, I meant not continue but else.

finish early continue

ecbeckmann marked 2 inline comments as done.Aug 17 2017, 6:21 PM
rnk edited edge metadata.Aug 17 2017, 6:27 PM

I think this is pretty close to ready. How do other reviewers feel?

llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
116

typo on "recursively"

221

Most of these if / else branches are about equivalent in terms of complexity. I'd actually suggest removing all the continues that got added (except the !Attribute->ns check). Most of the continues in the current code are redundant, because they'd hit the end of the loop naturally anyway.

ecbeckmann marked an inline comment as done.

Fix comment typo.

ecbeckmann added a comment.EditedAug 18 2017, 1:53 PM
This comment has been deleted.
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
221

I think there's a small benefit in terms of readability

ruiu added inline comments.Aug 18 2017, 2:18 PM
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp
274–276

No else after return

llvm/test/tools/llvm-mt/big_merge.test
5–8

Maybe it is better to write

/manifest <path1> \
/manifest <path2> \
...

instead of

<path1> /manifest \
<path2> /manifest \
 ...

because paths are arguments for /manifest.

llvm/tools/llvm-mt/llvm-mt.cpp
106–107

!A && !B is preferred over !(A || B).

But do you have to do this way? I think usually we handle OPT_UNKNOWN to report unknown options instead of putting all known options to some groups. Look at https://github.com/llvm-project/llvm-project-20170507/blob/dd24779ef3e3e592cc0c7561a2fb1fbe3309fa1c/lld/ELF/DriverUtils.cpp#L112 for example.

ecbeckmann marked an inline comment as done.

Removed else after return, also switched to early return in 2 other places.

ecbeckmann marked an inline comment as done.Aug 18 2017, 2:53 PM
ecbeckmann added inline comments.
llvm/tools/llvm-mt/llvm-mt.cpp
106–107

I have tried this....however for some reason OPT_UNKNOWN never filters any arguments for me. It seems for some reason the invalid flags are recognized as inputs, not flags. I'm not sure if I'm setting up the options incorrectly or if this is a bug in the option library.....also not sure if this is worth the effort to fix.

ecbeckmann marked an inline comment as not done.Aug 18 2017, 2:53 PM
ruiu added inline comments.Aug 18 2017, 3:05 PM
llvm/tools/llvm-mt/llvm-mt.cpp
106–107

Ah, okay, I think I know why it behaves like that. Unlike Unix, Windows command line options start with '/', and the character is ambiguous as to whether it is a path (e.g. /bin/ls) or an option (e.g. /manifest). LLVM's option parser's behavior is that, if an option "/foo" is defined, "/foo" is handled as an option. Otherwise, "/foo" is handled as a path.

This is an unavoidable consequence if you allow both Unix-style paths and Windows-style command line options. Since that's what we want, it is okay.

I think Group<supported> does not do anything. I believe your llvm-mt command still does not warn on unknown options. Please try llvm-mt /foo, for example. My guess is that it will warn that it cannot find file "/foo". You may want to just remove Group<supported> from Options.td and also remove this piece of new code.

ecbeckmann marked 2 inline comments as done.

Reformat test.

llvm/tools/llvm-mt/llvm-mt.cpp
106–107

I think you're right. It also seems to be somewhat of a bug with the option parsing library, OPT_UNKNOWN never seems to do anything. I just tried with lld-link and it seems unable to detect invalid flags, complaining rather that it can't open the given file. So that line in DriverUtils.cpp doesn't actually do anything.

My Group<supported> example does work however. For example, running llvm-mt /foo does indeed detect the invalid option given, instead of interpreting it as an input file.

ruiu added inline comments.Aug 18 2017, 4:41 PM
llvm/tools/llvm-mt/llvm-mt.cpp
106–107

OPT_UNKNOWN does work if you pass an argument with - prefix. Try -foo for example. I think it'll be handled as an OPT_UNKNOWN option.

I believe that your Group<supported> doesn't really work. Well, it's working in some sense, but this is the same as iterating over OPT_INPUT, no? I mean, I believe your code is the same as

for (auto *Arg : Args.filtered(OPT_INPUT))
  reportError(Twine("invalid option ") + Arg->getSpelling());

Remove supported group option

ecbeckmann marked an inline comment as done.Aug 18 2017, 4:57 PM

remove unnecessary encoding change.

ruiu accepted this revision.Aug 18 2017, 5:07 PM

LGTM

llvm/tools/llvm-mt/llvm-mt.cpp
105–106

Please add a test for an unknown option.

This revision is now accepted and ready to land.Aug 18 2017, 5:07 PM

Added test to handle unknown option