So far, I have implemented the gumtree algorithm https://github.com/GumTreeDiff/gumtree/ to match AST nodes in the source file with their equivalents in the destination files. It combines a heuristic top-down search with an optimal algorithm for small subtrees
The clang-diff tool will mimic the output of the gumtree textual diff, that is, it will print matched nodes (identified by their postorder offset) and an edit script consisting of insertions/deletions.
Note that every node is matched separately (even if large subtrees are equivalent), hence it is rather verbose.
clang-diff src.cpp dst.cpp
The trees that are used to create the matchings and the edit script can be serialized to JSON. I use this to test it by comparing the matchings and the edit script to the ones generated by a prototype implementation.
clang-diff -ast-dump src.cpp
Future features:
- detect code movements (not just inserts / updates / deletes)
- proper visualization (such as http://www.yinwang.org/resources/diff1-diff2.html)
- recursively compare directories
- proper handling of preprocessor macros
Missing doc comments? You could provide a brief, high-level overview of the algorithm that you're implementing.