[clang-format] Reorganize raw string delimiters
Needs ReviewPublic

Authored by krasimir on Wed, Dec 6, 9:34 AM.



This patch reorganizes the way raw string formats are handled:

  • all styles defined in a .clang-format file per language are propagated to the formatting library for the purposes of formatting code blocks
  • the raw string styles are organized around the language, with a list of delimiters and enclosing function names matching that language
  • there is an optional canonical delimiter per language, which can be used to update other delimiters for that language to the canonical one.

Diff Detail

krasimir created this revision.Wed, Dec 6, 9:34 AM
krasimir updated this revision to Diff 125915.Thu, Dec 7, 2:33 AM
  • Added unsafe canonical delimiter update handling
krasimir updated this revision to Diff 125921.Thu, Dec 7, 3:09 AM
  • Added support for enclosing function names
krasimir updated this revision to Diff 125927.Thu, Dec 7, 3:34 AM
  • Updated documentation
krasimir edited the summary of this revision. (Show Details)Thu, Dec 7, 3:39 AM
krasimir added a reviewer: djasper.
djasper added inline comments.Thu, Dec 7, 5:25 AM

Lets find a way to implement without this in the public header file.


Can you pull apart this patch?

In my view, it has three parts that have an ordering, but are actually fairly independent:

  1. Propagate all configured languages to the formatting library. First patch to land, should not affect the visible behavior.
  2. Restructure RawStringFormat to be centered around each language. This is a restructuring to make things easier and use #1.
  3. Add a CanonicalDelimiter and make clang-format canonicalize it.

I'll focus my comments on what's required for #1 for now as that is already complicated (IMO).


Prefer early exit, i.e.

if (!LanguageFound)
  return make_error_code(ParseError::Unsuitable);

I think this is getting a bit convoluted and I don't even understand whether we are doing what is document (even before this patch).

So in lines 892-905, we verify that:

  • Only the first Style in the file is allowed be LK_None.
  • No language is duplicated.

That seems good.

According to the documentation: "The first section may have no language set, it will set the default style options for all lanugages.". Does the latter part actually happen? Seems to me that we are just setting "Style" to the style configured for a specific language, completely ignoring values that might have been set in the LK_None style. Or is that somehow happening when reading the JSON?

Independent of that, I think we should use this structure more explicitly. I think we should create an additional class (FormatStyles or FormatStyleSet or something) that is returned by this function and handed to the formatting library. This function then doesn't need to look at the language anymore.

That class should then have a function getFormatStyle(LanguageKind Language); that returns the style for a particular language (doing the default logic, etc.). Internally, it can likely just have a map<LK, Style> and I don't think you need to pre-fill that for all language kinds. If a language kind is not in the map, you can just return what's stored for LK_None. WDYT?

krasimir added inline comments.Thu, Dec 7, 5:32 AM

I believe these should all go together: the reason that we propagate all configured languages to the formatting library is to be able to use them as a replacement for the BasedOnStyle in RawStringFormat. To make this possible, we need to update the internal structure of RawStringFormat itself to base it around each language. The canonical delimiter part is just a convenience for this I guess, which could be split.

My biggest concern with (1) is that since it has no visible behavior and no other uses except for the adaptation of (2), it is not testable by itself and it's not evident that a patch doing just (1) would handle the things correctly.

krasimir added inline comments.Thu, Dec 7, 5:43 AM

Yes, defaulting to the None for missing language specifications is handled at line 912:

if (!LanguageFound && (Styles[i].Language == Language ||
                       Styles[i].Language == FormatStyle::LK_None

I was thinking of the FormatStyleSet approach but the problem is that this has repercussions all over the library. We could indeed update this specific function that way, but fundamentally the additional language styles are part of the FormatStyle and need to somehow be recorded inside there. That's why I went with KISS and directly made this function handle that case.

krasimir updated this revision to Diff 125939.Thu, Dec 7, 5:52 AM
krasimir edited the summary of this revision. (Show Details)
  • Address review comments
krasimir marked 2 inline comments as done.Thu, Dec 7, 5:53 AM
djasper added inline comments.Fri, Dec 8, 2:03 AM

Ok, if you wish, this is not an unreasonable argument. But let's still do the code review in two steps: Lets for now just get the part of handling multiple languages straight and figure out the rest once we are sure that that part is fine.

(I do think you can test it, though - but it depends on whether I can convince you to go with the FormatStyleSet approach ;) )


But it's not just defaulting to LK_None what we are saying we are implementing. I think the documentation suggestion that we implement very basic inheritance. E.g. if the style for LK_None set the ColumnLimit to 42, I would expect that the styles for all other languages that don't explicitly set a ColumnLimit would also use 42. I don't think this is currently implemented and I don't think this patch implements it. But I think we should :).

I agree that the FormatStyleSet approach would have some consequences, but I also think that it is much cleaner. Your current solution feels like we us working around technical debt and creating more technical debt to do it :(. Maybe Manuel has thoughts here?

klimek added inline comments.Fri, Dec 8, 2:33 AM

On a philosophical level, something that has no visible behavior, and just restructures the code, should be tested by existing tests?

Enclosing function names also seems like an extra feature that could be pulled out, btw.


I agree that we should test the inheritance if it's documented :)

I don't have super strong feelings where the logic of implementing that inheritance lives - both the function that parses the data and the data structure we hand around seem fine for that, as it's easy to change

That said, I do agree that FormatStyle being a recursive graph data structure is weird and unexpected; I had to ask Daniel what he actually meant here, and then went "how does this even work??!" before realizing that FormatStyle is both a single style and also containing all other styles.

-> in conclusion, I agree with Daniel: we should have a FormatStyleSet and pass that around everywhere. That's a pure refactoring that looks like it would already make the current code better, so I also agree it should be done first and be covered by existing tests.