This is an archive of the discontinued LLVM Phabricator instance.

[clang][Diagnostics] Refactor printableTextForNextCharacter
ClosedPublic

Authored by tbaeder on May 17 2023, 10:55 PM.

Details

Summary

Rename parameters and local variables to start with an uppercase characters.

This is not 100% NFC, since it adds an easy/fast path for 1 byte printable characters and changes how much text is converted to UTF32.

Diff Detail

Event Timeline

tbaeder created this revision.May 17 2023, 10:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2023, 10:55 PM
tbaeder requested review of this revision.May 17 2023, 10:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2023, 10:55 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tbaeder added inline comments.May 17 2023, 10:56 PM
clang/lib/Frontend/TextDiagnostic.cpp
120

I don't know what this computation of end means, but from the debug output I added, it meant that a call to this function always converted the entire line. I think.

This generally looks good to me but i have a couple remarks

clang/lib/Frontend/TextDiagnostic.cpp
119–123

this could be simplified : the common case for ascii could be just looking at isPrint(*Begin); (which implies CharSize == 1 and llvm::isLegalUTF8Sequence(Begin, End))
So you could do it before computing CharSize

122

I think we should check that CharSize is not greater than Begin + SourceLine.size() - *i or something along those lines otherwise we may overflow SourceLine.

tahonermann requested changes to this revision.May 18 2023, 2:30 PM

Splitting this into two patches (one to do the renames, another to perform the other changes) would simplify review.

clang/lib/Frontend/TextDiagnostic.cpp
117–118
120

This could assign End to a location past the end of the source line when the source line ends with an ill-formed code unit sequence (e.g., a truncated 4-byte sequence). Constructing such a pointer is likely to get one in trouble.

120

It looks to me like an overly complicated way of asking for SourceLine.bytes_end().

This revision now requires changes to proceed.May 18 2023, 2:30 PM
tbaeder added inline comments.May 18 2023, 11:12 PM
clang/lib/Frontend/TextDiagnostic.cpp
120

If we use bytes_end() (unconditionally) we will check the entire source line every time we call this function (which always happens in a loop), which is what I was trying to avoid.

tbaeder updated this revision to Diff 525010.May 23 2023, 10:45 PM
tbaeder marked 2 inline comments as done.
tbaeder added inline comments.
clang/lib/Frontend/TextDiagnostic.cpp
119–123

This is not true in my testing fwiw.

cor3ntin added inline comments.May 24 2023, 1:53 AM
clang/lib/Frontend/TextDiagnostic.cpp
119–123
const unsigned char *Begin = SourceLine.bytes_begin() + *I;

  // Fast path for the common ASCII case.
  if (*Begin < 0x80 && llvm::sys::locale::isPrint(*Begin)) {
    ++(*I);
    return std::make_pair(SmallString<16>(Begin, Begin + 1), true);
  }

seems to work fine locally. Note that I'm not sure *Begin is always valid - it seems to be, but we might want an assert to check that SourceLine is not empty.

tbaeder added inline comments.May 24 2023, 2:09 AM
clang/lib/Frontend/TextDiagnostic.cpp
119–123

This function is only ever called in a while (I < SourceLine.size()) loop. I've thought about refactoring this into a helper struct that keeps the index separate from the calling function to simplify callers.

tbaeder added inline comments.May 24 2023, 2:18 AM
clang/lib/Frontend/TextDiagnostic.cpp
119–123

Oh also, there are two asserts at the beginning of the function to ensure I is valid.

tbaeder updated this revision to Diff 525094.May 24 2023, 3:05 AM

Give some time for @tahonermann to have an opportunity to review again, but otherwise this looks good to me. Thanks!

clang/lib/Frontend/TextDiagnostic.cpp
119–123

you are right, if, SourceLine is empty, that first asset would always fire

cor3ntin accepted this revision.May 24 2023, 2:33 PM

The changes look good to me. I suggested two minor edits to align with parameter name changes.

The summary states that the changes are not quite NFC. In that case, we would ideally have a test that demonstrates the changed behavior. Would adding such a test be challenging?

clang/lib/Frontend/TextDiagnostic.cpp
94
101–102

The summary states that the changes are not quite NFC. In that case, we would ideally have a test that demonstrates the changed behavior. Would adding such a test be challenging?

They are NFC, it's just not 100% only a refactoring, since it adds the new ASCII-only case.

tbaeder updated this revision to Diff 529170.Jun 6 2023, 11:04 PM
tbaeder marked 2 inline comments as done.
tahonermann accepted this revision.Jun 9 2023, 12:19 PM

They are NFC, it's just not 100% only a refactoring, since it adds the new ASCII-only case.

Ok, thanks. Changes look good. I noticed one comment that appears to be incorrect; please just remove it when committing the changes.

clang/lib/Frontend/TextDiagnostic.cpp
128

This comment is not correct; Begin might still point to a non-printable character with a code point value less than 0x80. I suggest just removing the comment.

This revision is now accepted and ready to land.Jun 9 2023, 12:19 PM
tbaeder marked an inline comment as done.Jun 9 2023, 10:51 PM
This revision was automatically updated to reflect the committed changes.