This is an archive of the discontinued LLVM Phabricator instance.

Reflow paragraphs in comments.
ClosedPublic

Authored by aprantl on Apr 26 2018, 2:30 PM.

Details

Summary

This is intended as a clean up after the big clang-format commit (r280751), which unfortunately resulted in many of the comment paragraphs in LLDB being very hard to read.

FYI the script I used was:

import textwrap
import commands
import os
import sys
import re
tmp = "%s.tmp"%sys.argv[1]
out = open(tmp, "w+")
with open(sys.argv[1], "r") as f:
  header = ""
  text = ""
  for line in f:
      comment = re.compile(r'^( *//) ([^ ].*)$')
      match = comment.match(line)
      if match:
          # skip intentionally short comments.
          if not text and len(match.group(2)) < 40:
              out.write(line)
              continue

          if text:
              text += " " + match.group(2)
          else:
              header = match.group(1)
              text = match.group(2)
          
          continue
      else:
          if text:
              filled = textwrap.wrap(text, width=(73-len(header)),
                                     break_long_words=False)
              for l in filled:
                  out.write(header+" "+l+'\n')
              text = ""

      out.write(line)

os.rename(tmp, sys.argv[1])
#find source -name '*.cpp' -exec python join-comments.py \{} \;

Diff Detail

Repository
rL LLVM

Event Timeline

aprantl created this revision.Apr 26 2018, 2:30 PM

The formatting of the comments has been bugging me as well. So far, I have been trying to reflow them whenever I touch the code around them, but at that rate fixing all of them would take ages.

Your heuristics seem to be working fairly well. I've found a couple of sub-optimal decisions in the few files I opened, but I am not sure if it's feasible to fix the algorithm to account for them.

So overall, this seems like a good idea to me, and the few places where it gets this wrong can be fixed up afterwards (it decreases the total number of badly formatted comments).

PS: I don't see any header files in the list. Is there a particular reason for skipping them?

source/API/SBDebugger.cpp
73 ↗(On Diff #144194)

Folding TODO: Here is sub-optimal.

source/API/SBInstruction.cpp
37–38 ↗(On Diff #144194)

This doesn't look right.

aprantl updated this revision to Diff 144420.Apr 27 2018, 4:46 PM

Updated the script based on Pavel's feedback:

import textwrap
import commands
import os
import sys
import re
tmp = "%s.tmp"%sys.argv[1]
out = open(tmp, "w+")
with open(sys.argv[1], "r") as f:
  header = ""
  text = ""
  comment = re.compile(r'^( *//) ([^ ].*)$')
  special = re.compile(r'^(([A-Z]+[: ])|([0-9]+ )).*$')
  for line in f:
      match = comment.match(line)
      if match and not special.match(match.group(2)):
          # skip intentionally short comments.
          if not text and len(match.group(2)) < 40:
              out.write(line)
              continue

          if text:
              text += " " + match.group(2)
          else:
              header = match.group(1)
              text = match.group(2)
          
          continue

      if text:
          filled = textwrap.wrap(text, width=(77-len(header)),
                                 break_long_words=False)
          for l in filled:
              out.write(header+" "+l+'\n')
              text = ""

      out.write(line)

os.rename(tmp, sys.argv[1])
#find source -name '*.cpp' -exec python join-comments.py \{} \;
labath accepted this revision.Apr 30 2018, 6:06 AM

Looks fine to me, I guess if noone objects, we can land this. Thanks for doing this.

This revision is now accepted and ready to land.Apr 30 2018, 6:06 AM
clayborg accepted this revision.Apr 30 2018, 8:11 AM
JDevlieghere accepted this revision.Apr 30 2018, 8:25 AM

👍

source/Host/common/MainLoop.cpp
158 ↗(On Diff #144420)

Is this intentional or maybe an off-by-one error? The a is in the 79th colum, which is okay for clang-format.

aprantl added inline comments.Apr 30 2018, 9:00 AM
source/Host/common/MainLoop.cpp
158 ↗(On Diff #144420)

My script sets the threshold to 78 columns. When you are on an 80-column terminal and your editor doesn't indicate whether long lines continue or not then not using the last two characters makes it unambiguous whether the a is the beginning of a new word or whether it stands for itself.
I'll change the threshold to 80 columns to avoid reflowing paragraphs that don't need it.

This revision was automatically updated to reflect the committed changes.