Page MenuHomePhabricator

format::cleanupAroundReplacements removes whole line when Removals leave previously non-blank line blank
Needs ReviewPublic

Authored by poelmanc on Oct 8 2019, 8:02 PM.



This change benefits clang-format, clang-tidy, clang-change-namespace, and others. When a sequence of Removal Replacements (excluding Insertions or non-empty Replacements) leaves a previously non-blank line with nothing but whitespace, this patch adds a Removal to remove the entire line.

The initial impetus for this change was clang-tidy, which usually can safely ignore formatting concerns as formatting is clang-format's job. However, clang-format treats blank lines as intentional separators, so any clang-tidy fixes that remove all text from lines and leave newly blank lines can prevent clang-format from properly fixing formatting. For example, applying readability-redundant-member-init to:

struct F10 {
  F10() :
  S f, g;

results in:

struct F10 {

  S f, g;

which may get converted to:

struct F10 {

= default;
  S f, g;

The newly blank lines prevent clang-format from fixing this to the desired:

struct F10 {
  F10() = default;
  S f, g;

A few individual clang-tidy fixes take steps to reduce some newly blank lines, e.g. RedundantControlFlowCheck.cpp calls Lexer::findLocationAfterToken(.../*SkipTrailingWhitespaceAndNewLine=*/true ) to chomp trailing newlines after redundant continue; or break; statements. But relying on each check to individually address the problem:

  1. does not check whether the whole line is blank before removing newlines, as that can only be known once all Replacements on the line are known
  2. can fail to notice when multiple removals combine to generate a blank line
  3. forces more work for each check writer
  4. results in inconsistency between checks regarding removals creating newly blank lines

For example:

: a(), b()

is fixed by readability-redundant-member-init with independent removals to remove the "a()" and "b()" and the ":" is finally removed in cleanupAroundReplacements, so only there can the code know the result will be blank.

This patch adds a function removeNewlyBlankLines to Format.cpp that looks for sequences of Replacement removals that leave a previously non-blank line blank and, where found, adds a Replacement to remove the whole line. removeNewlyBlankLines is called from cleanupAroundReplacements().

The patch adds tests to clang/unittests/Format/CleanupTest.cpp and clang-tools-extra/test/checkers/readability-redundant-{member-init,control-flow,declaration}.cpp, plus updates numerous tests in ChangeNamespaceTests.cpp and elsewhere to reflect the improved line-removing behavior.

Two previously private/static helper functions were moved to clang/base/CharInfo.h for use in the implementation: AST/CommentLexer's skipNewline and AST/CommentParser's isWhitespace(StringRef) (the former was also renamed to skipNewlineChars to avoid a name clash.)

Diff Detail

Unit TestsFailed

50 msx64 debian > Flang.Semantics::resolve102.f90
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/flang/test/Semantics/ /mnt/disks/ssd0/agent/llvm-project/flang/test/Semantics/resolve102.f90 /mnt/disks/ssd0/agent/llvm-project/build/tools/flang/test/Semantics/Output/resolve102.f90.tmp /mnt/disks/ssd0/agent/llvm-project/build/bin/f18 -intrinsic-module-directory /mnt/disks/ssd0/agent/llvm-project/build/tools/flang/include/flang

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
gribozavr2 added inline comments.Nov 11 2019, 8:41 AM

Why std::list? std::vector seems more appropriate to me.

poelmanc updated this revision to Diff 228795.Nov 11 2019, 6:45 PM
poelmanc edited the summary of this revision. (Show Details)

Move isWhitespace and skipNewlines to clang/Basic/CharInfo.h (renaming to isWhitespaceStringRef and skipNewlinesChars to resolve name clashes) and add double-quotes around "/n" and "/r" in comments.

poelmanc marked 4 inline comments as done.Nov 11 2019, 6:55 PM
poelmanc added inline comments.

Done. I renamed to skipNewlineChars to remind callers it might skip multiple (1 or 2) characters, and to avoid a name clash with very similar function skipNewline helper function at DependencyDirectivesSourceMinimizer.cpp:228. That version modifies its first argument and returns the number of characters skipped (0, 1, or 2.)

We could alternatively remove DependencyDirectivesSourceMinimizer.cpp's local skipNewline() helper function and call this version, but one location (line 370) would require a few lines of logic that I don't know how to test.


Done. I renamed it to isWhitespaceStringRef to avoid making it an overload of the existing isWhitespace(unsigned char), which causes ambiguous overload errors at QueryParser.cpp:42 & CommentSema.cpp:237.

We could alternatively keep this as an isWhitespace overload and instead change those two lines to use a static_cast<bool (*)(unsigned char)>(&clang::isWhitespace) or precede them with a line like:

bool (*isWhitespaceOverload)(unsigned char) = &clang::isWhitespace;

The changes are not directly related to clang-tidy, so could you get rid of that clang-tidy some-checker-specific tests and put some unittests into clang/unittests/Format/CleanupTest.cpp ?


use '\' instead of '/' ?


formatting seems to be off, have you run clang-format on your changes?


could you make rename this to isWhitespace and move it closer to isWhitespace(unsigned char c)


ah that's unfortunate, I believe it makes sense to have this as an overload though.

Could you instead make the predicate explicit by putting

[](char c) { return clang::isWhitespace(c); }

instead of just clang::isWhitespace in those call sites?

also could you rename the revision so that it reflects the fact that, this is a change to clang-format and has nothing to do with clang-tidy ?

poelmanc updated this revision to Diff 230375.Nov 20 2019, 11:23 PM
poelmanc marked 3 inline comments as done.
poelmanc retitled this revision from Clang-tidy fix removals removing all non-blank text from a line should remove the line to format::cleanupAroundReplacements removes whole line when Removals leave previously non-blank line blank.
poelmanc edited the summary of this revision. (Show Details)

I addressed the latest code review comments, added tests to clang/unittests/Format/CleanupTest.cpp, and updated numerous tests to reflect improved removal of blank lines.

I now realize how widely used cleanupAroundReplacements is. That's great as this fix improves clang-change-namespace as a side-effect. However, I have little experience running tests beyond clang-tidy or on Linux (I'm testing with MSVC) and would appreciate help identifying any test failures or regressions. Also please check my change to FixOnlyAffectedCodeAfterReplacements in clang/unittests/Format/CleanupTest.cpp.

poelmanc marked 5 inline comments as done.Nov 20 2019, 11:26 PM
poelmanc added inline comments.

Excellent! Better than the two ideas I thought of. Thanks!

poelmanc marked an inline comment as done.Nov 20 2019, 11:31 PM

also could you rename the revision so that it reflects the fact that, this is a change to clang-format and has nothing to do with clang-tidy ?

Done but feel free to suggest a better title.

(This journey started out with me trying to fix what I thought was a small quirk in clang-tidy's readability-redundant-member-init. Great feedback from more experienced developers along the way steered me to cleanupAroundReplacements, which feels perfect, but now I realize that gives it a broader impact.)

Thank you so much for the review and feedback!

ToolingTests/ApplyAtomicChangesTest.FormatsCorrectLineWhenHeaderIsRemoved also seems to be failing, you can run it with ninja ToolingTests && ./tools/clang/unittests/Tooling/ToolingTests --gtest_filter="ApplyAtomicChangesTest.FormatsCorrectLineWhenHeaderIsRemoved"


maybe assign FilePath = Replaces.begin()->getFilePath() here, and early exit in the beginning if Replaces is empty?
Then you can just assert(FilePath == R.getFilePath())


no need to re-assign at each loop.


it seems like this loop is trying to find line start, would be nice to clarify with a comment.

If so seems like the logic is a little buggy, for example:

int a^; if the offset is at ^(namely at a) then this will return current position as line start, since a is preceded by a whitespace.
Maybe change the logic to find the first \n before current offset, and skip any whitespaces after that \n (if really necessary)

btw, please do explain why you are skipping the preceding whitespaces in the comment.

could you also add a test case for this one?


this logic might not work in cases like:

in^t a;
int^ b;

let's assume you've got replacements at places marked with ^ again, then they would have same start position so you would miss the line change event.
maybe count number of \ns in the previous loop to detect current line number.

Also please add a test for this case as well.


maybe make it an early skip condition at the beginning of the loop?


the second loop seems like an over-kill and complicates the code a lot.

IIUC, it is just checking whether any text after replacements is "whitespace/newline" and if so deletes the whole line.
Instead of doing all of this, in the previous loop, whenever we see a line change could we scan until the EOL, and delete the EOL token if we didn't hit anything but whitespaces?

this would mean you won't need to store PotentialWholeLineCuts.



if(llvm::Error Err = NewReplaces.add(
          tooling::Replacement(FilePath, LineStartPos, CutCount, "")))
  return std::move(Err);

could you replace these with raw string literals instead to get rid of \ns.


can you make use of Annotations library in llvm/include/llvm/Testing/Support/Annotations.h for getting offsets into the code?

same for the following test cases

alexfh added inline comments.Nov 28 2019, 8:15 AM

I'd suggest to avoid overloading here. A name like isWhitespaceOnly would be less ambiguous.

As for implementation, it can be less verbose:

for (char C : S)
  if (!isWhitespace(*I))
    return false;
return true;

or just

return llvm::all_of(S, isWhitespace);

I'd like to draw your attention that isWhitespace(StringRef) is somewhat ambiguous. What exactly does it check? That the string is exactly one whitespace character? That it contains a whitespace? That it's only contains whitespace? Maybe isWhitespaceOnly? See also the comment in CharInfo.h.

poelmanc updated this revision to Diff 234976.Dec 20 2019, 1:54 PM

Address most of the feedback, I'll comment individually.

poelmanc updated this revision to Diff 235090.Dec 22 2019, 8:31 PM
poelmanc marked 2 inline comments as done.

Fix algorithm to fix failing ToolingTests/ApplyAtomicChangesTest.FormatsCorrectLineWhenHeaderIsRemoved. That test revealed a very specific but real failure case: if existing Removals removed all whitespace including the line's ending newline, and the subsequent line contained only a newline, it would delete the subsequent newline too, which was not desired. Added a new test CleanUpReplacementsTest.RemoveLineAndNewlineLineButNotNext to explicitly test this case.

poelmanc marked 13 inline comments as done.Dec 22 2019, 8:51 PM
poelmanc added inline comments.

Thank you for the detailed review of the algorithm and the suggestions, sorry it took me so long to get back to this. Done.


Good point, I clarified by refactoring this code into a simple helper function FindLineStart.

I don't believe there's buggy logic but the term vertical whitespace may have caused confusion - it includes only newline characters (linefeed & carriage returns) whereas whitespace includes those plus spaces and tabs. Now that the function is called FindLineStart the logic is hopefully clear.


I'm not sure I understand this concern; if you could draft up a specific test case I'd be happy to ensure it works.


I agree and thanks for the feedback! PotentialWholeLineCuts felt a bit hacky so I've eliminated it by creating a new MaybeRemoveBlankLine helper function that gets called in two places, eliminating the need for the second pass and the PotentialWholeLineCuts variable. Hopefully the code is easier to follow now.


I like that modernization suggestion but as a fairly new llvm contributor I follow the style I see in the surrounding code. Perhaps we should submit a seperate patch to modernize whole files at a time to use raw string literals and see how the change is received by the community.


Annotations looks quite handy, but again I'd rather get this patch through consistent with the file's existing style and submit a separate patch to modernize a whole file at a time to use Annotations.

poelmanc marked 4 inline comments as done.Dec 22 2019, 8:55 PM

ToolingTests/ApplyAtomicChangesTest.FormatsCorrectLineWhenHeaderIsRemoved also seems to be failing, you can run it with ninja ToolingTests && ./tools/clang/unittests/Tooling/ToolingTests --gtest_filter="ApplyAtomicChangesTest.FormatsCorrectLineWhenHeaderIsRemoved"

Thank you, this test revealed a real edge case for which the algorithm was failing, which I've now fixed and created a new test case to verify: If a set of Replacements removes all non-whitespace on a line including the trailing newline and the line is followed by a single newline character, it should not chomp the following newline as well.

poelmanc marked an inline comment as done.Dec 22 2019, 8:58 PM
poelmanc added inline comments.

Thanks for the suggestion, I've removed the overloading and updated the implementation.

kadircet marked 2 inline comments as done.Jan 3 2020, 4:35 AM
kadircet added inline comments.

can we rather make this function return a tooling::Replacement ?


name should be maybeRemoveBlankLine


instead of passing LineBlankSoFar as a parameter, maybe don't call the function at all if it doesn't hold?

if(!isAllWhiteSpace(Code.substr(CheckedThroughPos, LineEndPos - CheckedThroughPos))
  return llvm::Error::success();
if(isAllWhiteSpace(Code.substr(LineStartPos, CheckedThroughPos - LineStartPos))
  return llvm::Error::success();

Also move this to the top, as it doesn't actually require any of the computations you performed.

I would actually keep a HasDeleted flag in the caller, updating it by looking at length of replacements, and call this function iff we've deleted something in that line, but up to you.


i don't follow what this check is about.

the comment talks about a replacement removing a trailing newline, but check is for a preceding whitespace, and it is not even at the LineEndPos ?
how come we get a vertical whitespace, in the middle of a line?


name should be findLineStart


nit: no need to copy the parameter(i.e. creating a new variable)


ah thanks, I somehow missed the vertical part.


nvm this one, it was a consequence of missing vertical


both LineStartPos and LineCheckedThroughPos would be -1 for the first replacement in the file. and you will use them to index into Code.

I believe it makes sense for those to be initialized to 0.


nit: inline the variable, i.e R.getReplacementText().empty()


nit: RStartPos + R.getLength();


modernization of the existing code, and introducing old code are different things. former is harder because we would like to keep the history clean, but we tend to do the latter to make sure quality improves over time.

feel free to keep it if you want though, this one isn't so important.


This one is not a style choice. Annotations was basically not available at the time the other tests were written, and it makes the tests a lot more readable.
there's no easy way to figure out what is meant by all those offsets entered as numbers. So please make use of them in here, and I would really appreciate if you have time to send a follow-up patch to convert the rest of the tests in the file.

poelmanc updated this revision to Diff 320678.Feb 1 2021, 10:41 PM
poelmanc changed the repository for this revision from rCTE Clang Tools Extra to rG LLVM Github Monorepo.

Glad to be back after a year away from clang-tidy, and sorry to have let this patch linger. Running clang-tidy over a large codebase shows this patch still to be needed. I believe I've addressed all identified issues but welcome feedback.

@gribozavr @gribozavr2 requested changes to a very early version of this patch and I later reworked it to follow the cleanupAroundReplacements approach he suggested on 2019-10-11.

poelmanc marked 14 inline comments as done.Feb 1 2021, 11:10 PM
poelmanc added inline comments.

maybeRemoveBlankLine needs to be called twice so I tried to put maximum logic in it so to avoid replicating logic at the two calling sites. It doesn't always create a replacement, so I believe we could return optional<tooling::Replacement>. Then the caller would have to check the optional, and if set call NewReplaces.add(), then check that result... I can do that if you like but please take a fresh look at the updated implementation. Thanks!


Coming back after a year I found my own code here a bit hard to read... Your feedback prompted me to use StringRef over char* and reduce the number of intermediate variables. I put your two suggested isAllWhiteSpace calls isAllWhiteSpace together with &&, feel free to suggest further improvement.


Used suggested isAllWhitespace together in the if clause with above isAllWhitespace.


I clarified the comment to include an example: when the code is ... foo\n\n... and foo\n is removed by a replacement already, the second \n actually represents the next line which we don't want to remove.


I tried raw string literals, but these particular tests are all about tabs, spaces, and newlines which I found harder to see and reason about using raw string literals. I kept the raw string literals in RemoveLinesWhenAllNonWhitespaceRemoved.


Thanks, I figured out the Annotations pretty quickly and it definitely is easier to read and reason about.

poelmanc marked 5 inline comments as done.Feb 1 2021, 11:20 PM
poelmanc updated this revision to Diff 321451.Feb 4 2021, 8:36 AM

Change loop end condition in findLineEnd and add several assert statements.

poelmanc marked an inline comment as done.Feb 4 2021, 5:08 PM

On 2 Feb Harbormaster found a bug from my switching some char* code to use StringRef. I uploaded a new patch on 4 Feb, but Harbormaster does not appear to have rerun. What triggers Harbormaster - do I need to take some action? I haven't been able to find such options myself.

Please holler if there's a better place to ask this, thanks!

It looks like premerge tests skipped your last diff with id 321451 ( You can re-trigger by uploading a new diff, in the meantime i would also file a bug to mentioning your diff id.

poelmanc updated this revision to Diff 322033.Feb 7 2021, 10:51 PM

Update one one comment, hopefully trigger HarborMaster to rerun.

poelmanc set the repository for this revision to rG LLVM Github Monorepo.Feb 8 2021, 7:43 AM
poelmanc updated this revision to Diff 322108.Feb 8 2021, 7:46 AM

Comment change, "beginning" to "start" for consistency, being sure to set Repository on "diff" page (not just on edit page) to see if was the problem.

poelmanc updated this revision to Diff 322265.EditedFeb 8 2021, 6:44 PM

Call llvm::Annotation constructor rather than operator= to fix linux build issue, fix some issues identified by clang-format and clang-tidy.

poelmanc added a comment.EditedFeb 8 2021, 9:04 PM

It looks like premerge tests skipped your last diff with id 321451 ( You can re-trigger by uploading a new diff, in the meantime i would also file a bug to mentioning your diff id.

Thanks @kadircet, actually just reviewing the current bug list explained the problem:, Build is not triggered if diff repository is not set. I wasn't selecting rG LLVM Github Monorepo per instructions at that said Leave the Repository field blank.

HarborMaster has now run and passes all tests on Windows and Linux.

  1. clang-format objects to a new test case in clang-tools-extra/test/clang-tidy/checkers/readability-redundant-member-init.cpp. That test case formatting matches real-world examples so I'm reluctant to change it - I believe it's clang-formatted but with different settings from llvm-project's.
  2. clang-tidy finds 8 int vs size_t comparisons. While int feels safer when subtraction is involved (since one can assert variables stay positive), this particular code would work fine with int variables changed to size_t, or alternatively with static_cast to silence the warnings.
poelmanc updated this revision to Diff 326957.Feb 28 2021, 2:06 AM

Removed a function in modernize/LoopConvertCheck.cpp that's no longer needed due to this patch. Fixed clang-format and clang-tidy issues identified by HarborMaster in prior submission.

Hopefully it's ready to go, but as always I'm happy to receive any feedback.