This is an archive of the discontinued LLVM Phabricator instance.

Deduplicate replacements by FileEntry instead of file names.
ClosedPublic

Authored by ioeric on Nov 3 2016, 12:38 PM.

Details

Summary

The current version does not deduplicate equivalent file paths correctly.
For example, a relative path and an absolute path are considered inequivalent.
Comparing FileEnry addresses these issues.

Event Timeline

ioeric updated this revision to Diff 76881.Nov 3 2016, 12:38 PM
ioeric retitled this revision from to Deduplicate replacements by FileEntry instead of file names..
ioeric updated this object.
ioeric added a reviewer: djasper.
ioeric added a subscriber: cfe-commits.
alexander-shaposhnikov added inline comments.
lib/Tooling/Core/Replacement.cpp
578

(saves 2 lines of code) (although not particularly important)

if (ProcessedFileEntries.insert(FE).second)
   Result[Entry.first] = std::move(Entry.second);
ioeric updated this revision to Diff 76888.Nov 3 2016, 2:29 PM
  • Kill a few lines.
lib/Tooling/Core/Replacement.cpp
578

Always nice to save a few lines :) Thanks!

klimek added a comment.Nov 3 2016, 2:32 PM

If the files do not exist, how does this work? Won't that potentially give different results if the files exist locally vs if they don't?

ioeric added a comment.Nov 3 2016, 3:19 PM

If the files do not exist, how does this work? Won't that potentially give different results if the files exist locally vs if they don't?

This would be problematic. Thanks for the catch!

ioeric updated this revision to Diff 76894.Nov 3 2016, 3:19 PM
  • Addressed comments: handle non-existing files.
  • Addressed comments: handle non-existing files.

We're not really handling them now though? We're just printing an error?

My point is that we might run the replacement generation on a distributed system, and then group/deduplicate/apply them somewhere where the files might not actually exist (think the reduce stage of a mapreduce). If possible, I'd like to not rely on the existence of the file when we deal with Replacements. I'd find it especially problematic if the existence or non-existence of files changes the semantics of those operations.

ioeric added a comment.Nov 5 2016, 1:13 PM
  • Addressed comments: handle non-existing files.

We're not really handling them now though? We're just printing an error?

My point is that we might run the replacement generation on a distributed system, and then group/deduplicate/apply them somewhere where the files might not actually exist (think the reduce stage of a mapreduce). If possible, I'd like to not rely on the existence of the file when we deal with Replacements. I'd find it especially problematic if the existence or non-existence of files changes the semantics of those operations.

This function is only used in replacements application phase, and I think this is the most easiest if not the best way to ensure that two sets of replacements for the same file are applied without needing to handle dots and relative/absolute paths.

For distributed replacements generation, I think it makes more sense to duplications on the replacements application phase since this needs to be done when combining all replacements anyway.

It might make more sense to make this function local in tooling/Refactoring.cpp. But in that case, I can't unit test it :(

klimek added a comment.Nov 5 2016, 5:35 PM

Ok, makes sense. Can you add a comment to the function that explains what happens with replacements for files that do not exist (we ignore them, unless I'm mistaken). Otherwise LG.

ioeric updated this revision to Diff 76975.Nov 5 2016, 5:46 PM
  • addressed comments.
ioeric added a comment.Nov 6 2016, 9:19 AM

Since Manuel's comment has been addressed, I'd like to land this if there is no objection.

ioeric updated this revision to Diff 77008.Nov 6 2016, 10:09 PM
  • Merge with origin/master
This revision was automatically updated to reflect the committed changes.