After applying replacements, redundant code like extra commas or empty namespaces
might be introduced. Fixer can detect and remove any redundant code introduced by replacements.
The current implementation only handles redundant commas.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
include/clang/Format/Format.h | ||
---|---|---|
813 ↗ | (On Diff #51905) | I am adding reformatting after fixing, and I am wondering if we can get rid of the Style parameter here (since the style does not really matter in code fixing, I think?) and let the reformat handling the style with the use case like: FinalReplaces = formatReplacements(Code, fixReplacements(Code, Replaces), Style); where fixReplacements is a function parallel to formatReplacements. |
- Added fix for empty namespace.
- Merge multiple continuous token deletions into one big replacement; minor code styling.
- refactored code to reduce redundancy.
include/clang/Format/Format.h | ||
---|---|---|
770 ↗ | (On Diff #52547) | I think we should not automatically format the replacements after fixing. This function can easily be combined with formatReplacements if that is desired. |
802 ↗ | (On Diff #52547) | This sentence doesn't make sense. |
803 ↗ | (On Diff #52547) | I believe the style will become important as certain aspects of fixing are going to be controlled via the style. E.g. some styles might want to turn: if (a) return; into: if (a) { return; } |
lib/Format/Format.cpp | ||
1446 ↗ | (On Diff #52547) | I'd probably move all of these out into a class so you don't need to pass the same set of parameters between all of them and so that you can make those private that aren't meant to be called directly. |
1596 ↗ | (On Diff #52547) | I am not sure, this is the right class to pull out. It still has a lot of overlap with formatter. Maybe it is instead a better idea to pull out each fixer into its own class and pull them into the formatter. |
1651 ↗ | (On Diff #52547) | The Tokens parameter is unused!? |
1654 ↗ | (On Diff #52547) | I am not (yet) convinced that the fundamental design here makes sense. So far it seems to me that all fixers will either operate within a single line (ctor fixer and probably most others) or that they do something specific on multiple lines (namespace fixer and empty public/private section fixer and maybe others). Creating a new way to iterate over all tokens of all lines doesn't really add useful functionality here AFAICT. |
1657 ↗ | (On Diff #52547) | No braces. Here and at all other single-statement ifs. |
1690 ↗ | (On Diff #52547) | Having a function called "reset" that is only called once is weird. Just do this in the constructor? |
1748 ↗ | (On Diff #52547) | Comment on what this is doing. |
1749 ↗ | (On Diff #52547) | Ed? Either use E or End. |
1891 ↗ | (On Diff #52547) | Why is this needed? |
1909 ↗ | (On Diff #52547) | A lot of these need documentation. What's the current line? What is the last token (last token that was read? last token of line? last token of file?). What are RedundantTokens? |
2356 ↗ | (On Diff #52547) | Too much code duplication. Find a way to avoid that (passing in the calls to either reformat() or fix() as lambdas to a shared internal function). |
unittests/Format/FormatTest.cpp | ||
11236 ↗ | (On Diff #52547) | All of this should go into a different file. |
- Refactored the code to reduce code duplication. Code styling. Moved test cases for Fixer into a new file FixTest.cpp.
- Added RangeManager to manage affected ranges. make empty namespace fixer delete tokens not in an affected line.
lib/Format/Format.cpp | ||
---|---|---|
1596 ↗ | (On Diff #52547) | But I'm not quite sure if implementing the fixer in formatter works better since they look like two parallel classes to me. And it might make the formatter more complicated to use if we mix it with fixer. I think adding another layer of abstraction might be one way to reduce duplication? |
1654 ↗ | (On Diff #52547) | I've also considered the way formatter does, which iterates over lines. But it seems to be easier to write fixer, especially multi-line fixer, if we can just ignore lines. Working with lines seems to add another layer of complexity here? |
lib/Format/Format.cpp | ||
---|---|---|
1654 ↗ | (On Diff #52676) | That I don't understand. Almost all fixers (cleaning up commas, constructor initializers, etc.) will only ever look at a single line. The matchers that do make multiline fixes (empty namespaces, empty public/private sections) only look at complete lines, i.e. only ever use AnnotatedLine::startsWith(...) and don't iterate over the tokens, AFAICT. I really think that giving the option to iterate over all tokens of multiple lines does more harm than good. Among several other things, error recovery is made harder. It never makes sense for the ctor-initializer fixer to leave its line (or else you'll just ignore some nice error recovery already implemented in the UnwrappedLineParser). |
1693 ↗ | (On Diff #52676) | I have a comment here (and in some other places), but I think this will be different anyway if we move to line-based fixers. So I am holding off with those comments for now. |
1718 ↗ | (On Diff #52676) | Just go to CurrentToken->MatchingParen!? |
1824 ↗ | (On Diff #52676) | I am happy to show you how the implementation here gets much simpler if you only iterate over AnnotatedLines. |
1916 ↗ | (On Diff #52676) | I think "redundant" is the wrong word here. How about DeletedTokens? |
- Change implementation of fixer to iterate by lines. TODO: refactor Formatter and Fixer to reduce duplication.
include/clang/Format/Format.h | ||
---|---|---|
769 ↗ | (On Diff #52832) | I am actually not sure that fixReplacements is the right terminology here (sorry that I haven't discovered this earlier). It isn't really fixing the replacements (they aren't necessarily broken). How about cleanupAroundReplacements? Manuel, what do you think? |
lib/Format/Format.cpp | ||
1596 ↗ | (On Diff #52832) | But like this, it is way to much code duplication for me, both in number of lines of code as well as code complexity. Basically everything in fix() is duplicated along with most of the class members etc. The main difference seems to be that runFix() is called instead of format. There must be a way to have only one implementation of that code. I think (but cannot see all the consequences yet) you should merge the two classes. And if that leads to too much complexity, you can probably pull out runFix and everything it calls into a separate class as well as possibly format() and everything that format calls. |
1597 ↗ | (On Diff #52832) | Why are you doing this? |
1664 ↗ | (On Diff #52832) | Can you move the ConstructorInitializer fixer and everything that belongs to it to a different patch? I want to look at that in more detail and two smaller patches make this easier for me. |
1672 ↗ | (On Diff #52832) | No need for this boolean. Write this as: for (FormatToken *Tok = Line.First;; Tok = Tok->Next) { if (!Tok->is(tok::comment)) return false; } return true; And maybe we should (in a separate patch) add an iterator interface to AnnotatedLine so that we can use range-based for loops. |
1695 ↗ | (On Diff #52832) | Why not return -1 here? |
1699 ↗ | (On Diff #52832) | Move the startsWith into containsOnlyComments. |
1701 ↗ | (On Diff #52832) | I'd write this in a very different order: if (AnnotatedLines[CurrentLine]->startsWith(tok::r_brace)) break; if (AnnotatedLines[CurrentLine]->startsWith(kw_namespaces) || ..) { int NewLine = checkEmptyNamespace(CurrentLine); if (NewLine == -1) return -1; continue; if (!containsOnlyComments(..)) return -1; |
1702 ↗ | (On Diff #52832) | It's a bit subtle that this works correctly. At least I needed to think quickly why you wouldn't falsely run into this in namespace N { void f() { } } But of course, then the namespace wouldn't be empty. Please leave a comment explaining this briefly. |
1717 ↗ | (On Diff #52832) | Hm, this seems very inelegant. If you have: namespace A { namespace B { namespace C { } } } "namespace C {" and "}" gets delete three times. At the very least you should add if (AnnotatedLines[i]->Deleted) continue; Thinking about this some more, it will get deleted even more often has even the outer loop in runFix will call checkEmptyNamespace again for the inner namespaces even though they have been deleted already. So inner namespaces will bei deleted O(N^2) times if N is the nesting depth. Lets get that down to 1 instead ;-) |
1818 ↗ | (On Diff #52832) | Can you explain why you are doing this? |
2286 ↗ | (On Diff #52832) | Don't define a typedef for something that is only used once. Also, this is an internal function, how about writing this as: template <typename T> static tooling::Replacements processReplacements(StringRef Code, const tooling::Replacements &Replaces, T ProcessFunc, const FormatStyle &Style) { } No need to spell out the function type (I think). |
- Moved constructor initializer fixer to a separate patch; pull runFixer and runFormat into separate classes, and merge common code in CodeProcessor class.
lib/Format/Format.cpp | ||
---|---|---|
1590 ↗ | (On Diff #53202) | I'm not quite sure if the name is accurate. Do you have a better name? |
1597 ↗ | (On Diff #52832) | I am not quite sure here actually. This was copied from Formatter. And since consumeUnwrappedLine requires at lease one element in UnwrappedLines, I figured this is necessary initialization? |
1818 ↗ | (On Diff #52832) | I want to reduce the number of replacements so that when we do reformat on the fixed code, there could be fewer changed code ranges considering that the current implementation of computeAffectedLines is not that efficient when we have many ranges. |
2286 ↗ | (On Diff #52832) | Wow, this is really helpful! Thanks! |
include/clang/Format/Format.h | ||
---|---|---|
800 ↗ | (On Diff #53202) | Same as above, "fix" is probably not the right word. "cleanup" seems somewhat better. Here and everywhere else in this patch :-/. |
lib/Format/Format.cpp | ||
1446 ↗ | (On Diff #53202) | Maybe call this AffectedRangeManager? That is the class's very specific purpose. |
1590 ↗ | (On Diff #53202) | I think the difficulty of finding a name somewhat stems from the fact that it does multiple things at once. I think the main purpose is to encapsulate the "flattening" of the different #if/#else branches in successive runs. Manuel, do you have an idea for a good name here? |
1607 ↗ | (On Diff #53202) | It feels like these two functions increase the coupling between Formatter/Fixer and CodeProcesser and aren't really necessary. Can you just inline them at the locations where they are called? |
1669 ↗ | (On Diff #53202) | Lets not have this access. Provide getters. |
1908 ↗ | (On Diff #53202) | I see why you are doing it this way, but let me propose something else: How about iterating over all of the lines for each thing to cleanup (namespaces, ctor-intializers, etc.). I don't think it makes a significant difference in the runtime. Then you can just skip all the lines that you have already looked at (i.e. the returned int). Among other things, you can probably remove the duplication of the AnnotatedLines[CurrentLine]->startsWith(..) and maybe even the AnnotatedLine::Deleted field. |
lib/Format/FormatToken.h | ||
283 ↗ | (On Diff #53202) | Do you need this? Wouldn't it be enough to add them to the set of deleted tokens and mark them as Finalized? |
lib/Format/Format.cpp | ||
---|---|---|
1677 ↗ | (On Diff #53370) | I think FormatStyle in the Processor should be const and Formatter should make its own copy to modify it. |
1926 ↗ | (On Diff #53370) | Why are you introducing a return value that you aren't using? I'd just return the number of subsequent lines to skip. Overall, I'd do this a bit differently:
That way, you don't need to store DeletedLines and you don't ever look at the same line multiple times. |
1927 ↗ | (On Diff #53370) | What about namespace A { // really cool namespace ? |
1987 ↗ | (On Diff #53370) | I believe that this doesn't work in many circumstances. Not sure I can easily construct a test case for the namespace cleanup, but generally, AnnotatedLines are not in a strict order. The last token of a line isn't necessarily next to the first token of the next line. This happens, e.g. for nested blocks and if preprocessor lines are intermingled with other lines. |
1990–1993 ↗ | (On Diff #53370) | I think this should hardly matter considering that we are probably only going to have a few instances of cleanup replacements and the range calculation is not that inefficient. |
2224 ↗ | (On Diff #53370) | Do we also need that if we spell out the ProcessFunc above? Maybe that's actually less painful compared to writing these lambdas? |
- Do not merge multiple lines when generate fixes. Added test cases with comments around namespace.
- Refactor: pull Annotator into Formatter/Cleaner from CodeProcessor. Moved AffectedRangeManager to a separate file.
include/clang/Format/Format.h | ||
---|---|---|
769 ↗ | (On Diff #53866) | cleanupAroundReplacements sounds good. |
lib/Format/Format.cpp | ||
1466 ↗ | (On Diff #53866) | Nit: I think it's idiomatic to call this a "Callback". |
2143 ↗ | (On Diff #53866) | This is unexpected: I'd not have expected the CodeProcessor to be used to keep state apart from what's used during a Process() run. I think we'll need a different name. |
2176 ↗ | (On Diff #53866) | The way the Processor is passed into the callbacks so they can call stuff on the processor again makes me think this is really tightly coupled and should use inheritance. Alternatively, we might want to try to break up the CodeProcessor into a part that just runs the callback and keeps the state for running the callback around, and an interface that is used by the callbacks - I'll need to look more closely into this though. |
lib/Format/Format.cpp | ||
---|---|---|
1450 ↗ | (On Diff #53866) | After pondering a bit more, it seems like CodeProcessor is too generic: this is more something that runs analyses based on annotated tokens, so I'd call it TokenAnalyzer or something. That would also make it obvious to push some of the duplicated functionality from Formatter and Cleaner down (namely everything up to where we have generated the annotated tokens). I'm still not completely convinced either way regarding inheritance vs. composition, but if we have a TokenAnalyzer, we'd also have a nice is-a relationship. While I generally agree to prefer composition, I also think we need to look out for where inheritance actually makes things clearer. |
- Rebased
- Make Formatter and Cleaner inherit from TokenAnalyzer (new name for CodeProcessor).
lib/Format/Format.cpp | ||
---|---|---|
1544 ↗ | (On Diff #54476) | Move this into the base class? |
2097 ↗ | (On Diff #54476) | Instead, create a class Environment that does all of these and makes SourceMgr, ID and CharRanges available via getters (maybe more). Then you should be able to just instantiate an environment and call the other reformat/cleanup function with the corresponding arguments. |
We'll probably want Daniel to also take another look over it, as it's a pretty substantial change that will haunt us for a while, but I think this now pretty much looks like I expect it to look.
lib/Format/Format.cpp | ||
---|---|---|
1509 ↗ | (On Diff #54837) | Don't name it SourceMgr, as there is a class of that name. Unfortunately we'll probably want to name it SM, like everywhere else in clang :( |
1454 ↗ | (On Diff #54848) | Won't deleting this line have the same effect? |
1489–1490 ↗ | (On Diff #54848) | There is no base class any more. |
1872 ↗ | (On Diff #54848) | In LLVM TODOs are spelled FIXME. |
@Daniel, sorry that I forgot to have you look at the final version before
submitting it...
cfe/trunk/lib/Format/AffectedRangeManager.h | ||
---|---|---|
59 | And an empty line between functions and data members here. | |
66 | Fix header guard comment | |
cfe/trunk/lib/Format/Format.cpp | ||
1355 | What happened here? | |
1463 | Hand in CharRanges as ArrayRef or const vector&. | |
1471 | Why don't you do this in the constructor? Seems asymmetric to just call a constructor in the one codepath but a factory function in the other one. | |
1519 | Would it be sufficient to return a const SourceManager&? I guess not, but I'd like to understand where it breaks. If we could return a const source manager here, that would enable us to pass around a "const Environment&" at a few places. |
cfe/trunk/lib/Format/Format.cpp | ||
---|---|---|
1355 | This line exceeded 80 characters and was formatted by clang-format when I ran clang-format across it...but I guess this change was out of the scope of this patch...sorry about that. | |
1471 | We have a reference member "SourceManager &SM" that can only be initialized after setting up the environment...I didn't know if constructor would work in this case. | |
lib/Format/Format.cpp | ||
1544 ↗ | (On Diff #54476) | The way we calculate affected ranges are the same for Cleaner and Formatter. But in the future, we might want more accurate affected ranges for Cleaner. |
And an empty line between functions and data members here.