This is an archive of the discontinued LLVM Phabricator instance.

[docs][clang-format] use yaml types style
ClosedPublic

Authored by FederAndInk on Aug 26 2021, 7:31 AM.

Details

Summary

Use yaml types for clang-format documentation (String, Integer, List of Strings, ...) instead of c++ types

track generated plurals in a file

Fix typo in clang/Format/Format.h

Diff Detail

Event Timeline

FederAndInk requested review of this revision.Aug 26 2021, 7:31 AM
FederAndInk created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2021, 7:31 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MyDeveloperDay added a project: Restricted Project.Aug 26 2021, 7:39 AM
MyDeveloperDay requested changes to this revision.Aug 26 2021, 7:52 AM
MyDeveloperDay added a subscriber: MyDeveloperDay.

Thank you for your submission, but...

  1. This is not the way to change this file, its autogenerated from clang/include/Format/Format.h using clang/docs/tools/dump_format_style.py
  2. Why do you think it should be std::string? To be honest this file pretty much describes the YAML file format of .clang-format so actually I would suggest it saying string was more correct

I think the main problem is options like this:

Here we say its a std::vector<RawStringFormat> which doesn't really tell you much, as its actually just a YAML array of RawStringFormat records without actually telling you what a RawStringFormat record contains

RawStringFormats:
  - Language: TextProto
      Delimiters:
        - 'pb'
        - 'proto'
      EnclosingFunctions:
        - 'PARSE_TEXT_PROTO'
      BasedOnStyle: google
  - Language: Cpp
      Delimiters:
        - 'cc'
        - 'cpp'
      BasedOnStyle: llvm
      CanonicalDelimiter: 'cc'

So whilst I see that you were being consistent I kind of feel its in the wrong direction, we should be moving away from using C++ names here but more explaining what the YAML should look like.

Alas the C++ in Format.h is important because the types are needed because we are generating the documentation directly from the code. But in most of places we are using an enumeration like this one.

we are not saying clang::FormatStyle::RefernceAlignmentStyle and I'm not convinced we'd want to

I hope that helps, but thank you for the patch.

This revision now requires changes to proceed.Aug 26 2021, 7:52 AM

Thank you for your explanations, I understand now.

But as I look into clang/docs/tools/dump_format_style.py I see that it does not entirely generate clang/docs/ClangFormatStyleOptions.rst it replaces the lines between {START,END}_FORMAT_STYLE_OPTIONS

I understand your point, but as of now, the inconsistency comes from the part that is not auto-generated, are you suggesting editing dump_format_style.py to have simpler types such as string? Then how should we replace std::vector? Something like Type[] e.g. string[]?

Or maybe we should first include BasedOnStyle into dump_format_style.py. Then take care of how to render types?

What do you suggest? I am genuinely asking, as I really don't know what would be the best way to do things. Maybe we should include other people? I don't really know who to add as reviewers for that, but I think, the way to show types, should be discussed?

As for detailing RawStringFormat, it wasn't the purpose of this patch, and maybe it should have its own?

Thank you for your explanations, I understand now.

But as I look into clang/docs/tools/dump_format_style.py I see that it does not entirely generate clang/docs/ClangFormatStyleOptions.rst it replaces the lines between {START,END}_FORMAT_STYLE_OPTIONS

I understand your point, but as of now, the inconsistency comes from the part that is not auto-generated, are you suggesting editing dump_format_style.py to have simpler types such as string? Then how should we replace std::vector? Something like Type[] e.g. string[]?

Or maybe we should first include BasedOnStyle into dump_format_style.py. Then take care of how to render types?

What do you suggest? I am genuinely asking, as I really don't know what would be the best way to do things. Maybe we should include other people? I don't really know who to add as reviewers for that, but I think, the way to show types, should be discussed?

As for detailing RawStringFormat, it wasn't the purpose of this patch, and maybe it should have its own?

You are correct the file isn't 100% generated and some of it comes from another .h file too.

But now we have you interesting in making a contribution which you clearly are lets think about how we might do this.

To hook into the clang-format team I always recommend adding the #clang-format project, (which I added to this review), but also I recommend passing the review via @krasimir , @HazardyKnusperkeks , @curdeius there are some others who are here often like @owenpan and @sammccall and of course @klimek (who started all this). Please also of course add me @MyDeveloperDay I try to check the reviews daily as one of my frustrations was not being able to get things reviewed so I try to be pretty active.

From my perspective I do like the idea of substituting out the std::string and std::vector for something like string[] how about we start with something simple like trying to fix the cases for AttributeMacros (std::vector<std::string>) maybe with just simple substitution.

We can pass that via the rest of the team and see what they feel.

AttributeMacros (in configuration string[])

or something like that, I'm not convinced anyone is using this documentation to know its a std::vector!

HazardyKnusperkeks requested changes to this revision.Aug 26 2021, 11:59 AM

As the one who wrote that:

  1. Yes that part is not auto generated, because it is not in the format.h.
  2. I'm more in favor of removing the std:: from the documentation.
  3. It is for the YAML documentation, then stick with the YAML names: List of Strings is my proposal.

What about: unsigned? And enum or struct types such as BracketAlignmentStyle, ArrayInitializerAlignmentStyle, ...? Should we add something like: Enum BracketAlignmentStyle and Dictionnary BraceWrapping?

The complete list:

53 bool                              -> Boolean
18 unsigned                          -> ?
9 std::vector<std::string>           -> List of Strings
5 std::string                        -> String
4 AlignConsecutiveStyle              -> ?
2 int                                -> Integer
1 UseTabStyle                        -> ?
1 TrailingCommaStyle                 -> ?
1 string                             -> String
1 std::vector<RawStringFormat>       -> ?
1 std::vector<IncludeCategory>       -> ?...
1 SpacesInLineComment
1 SpacesInAnglesStyle
1 SpaceBeforeParensOptions
1 SpaceAroundPointerQualifiersStyle
1 SortJavaStaticImportOptions
1 SortIncludesOptions
1 ShortLambdaStyle
1 ShortIfStyle
1 ShortFunctionStyle
1 ShortBlockStyle
1 ReturnTypeBreakingStyle
1 ReferenceAlignmentStyle
1 PPDirectiveIndentStyle
1 PointerAlignmentStyle
1 OperandAlignmentStyle
1 NamespaceIndentationKind
1 LanguageStandard
1 LanguageKind
1 LambdaBodyIndentationKind
1 JavaScriptQuoteStyle
1 IndentExternBlockStyle
1 IncludeBlocksStyle
1 EscapedNewlineAlignmentStyle
1 EmptyLineBeforeAccessModifierStyle
1 EmptyLineAfterAccessModifierStyle
1 DefinitionReturnTypeBreakingStyle
1 BreakTemplateDeclarationsStyle
1 BreakInheritanceListStyle
1 BreakConstructorInitializersStyle
1 BracketAlignmentStyle
1 BraceWrappingFlags
1 BraceBreakingStyle
1 BitFieldColonSpacingStyle
1 BinPackStyle
1 BinaryOperatorStyle
1 ArrayInitializerAlignmentStyle

I'm good with words like List of Strings but I don't think we need Enum

unsigned I think Integer, I'm not sure what the code is even going to do if you supply a -ve, Warn I hope! (there's contribution idea number 2 for you right there!) ;-)

Ok, well, the reason I proposed this patch in the first place was that I have been working on a .clang-format schema (https://json-schema.org/) :) and I spotted the inconsistency. I checked, and clang-format reports an error if we give a negative value to an option expecting an unsigned. In the schema I am able to specify a minimum and I think it's appropriate to give the information to the user that it expect a positive/unsigned integer, what do you think?

Also, interesting question, how do we want to handle plural, as the formulation 'List of Types' introduces it. If we do it manually, it won't scale. We could include a dependency in python to something like inflect. I'll upload a new patch soon.

Use yaml type style for clang-format documentation (String, Integer, List of Strings, ...) instead of c++ types.

Fix typo in clang/Format/Format.h

Regenarate ClangFormatStyleOptions.rst

For now, I handle plural manually, but it can be changed, I also kept Unsigned, what are your thoughts?

Thanks for being so kind and responsive, it's really great to work on that :) as it is my first contribution to clang.

MyDeveloperDay accepted this revision.Aug 27 2021, 5:57 AM

LGTM, thanks for adding that and fixing the spelling mistake, let the others have time to chip in.

For now, I handle plural manually, but it can be changed, I also kept Unsigned, what are your thoughts?

I think thats ok for now, did you try building the file with sphinx-build, I run

`/usr/bin/sphinx-build -n ./docs ./html`

Then go and look at the html

Thanks for being so kind and responsive, it's really great to work on that :) as it is my first contribution to clang.

That is how it should be right ;-).

I assume you don't have commit access, we'll need your name and email address if you want one of us to land it, so we can accredit you with the contribution, but if you think you might like to work on some other things, it might be worth get ting the  "getting commit access" process started.

I think its always good to wait to give others some chance to comment before we commit, as we are all in different timezones.

look at the html

Done, thanks

I assume you don't have commit access, we'll need your name and email address

Here it is: Ludovic Jozeau <federandink@gmail.com> it should also be on my profile

but if you think you might like to work on some other things, it might be worth get ting the "getting commit access" process started.

Yeah, after reading https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access, I think I'll wait until I do another contribution. :)

I think its always good to wait to give others some chance to comment before we commit, as we are all in different timezones.

Ok, that's fine.

Looks good and I agree with the choices.

clang/docs/tools/dump_format_style.py
35

Could you not just check if there is a y at the end and replace it with ies, otherweise add an s?

This revision is now accepted and ready to land.Aug 27 2021, 1:13 PM
FederAndInk added inline comments.Aug 27 2021, 3:42 PM
clang/docs/tools/dump_format_style.py
35

Well, I thought about it, but then what about: whish -> whishes, leaf -> leaves, ... and irregulars? That's why I brought up the idea about using python inflect. Do you think it's enough for now to replace y -> ies and put an 's' to the others?

clang/docs/tools/dump_format_style.py
35

I'm okay with either way, in both cases there comes a time where someone must pay attention to add something here. We just have to look carefully in the review.

MyDeveloperDay added inline comments.Aug 29 2021, 4:30 AM
clang/docs/tools/dump_format_style.py
35

it would be nice if in the event of a missing plural it complained.

clang/docs/tools/dump_format_style.py
35

But that would mean we have to add each plural here. So every new list, which is not of strings, would most likely also needed to be added here. I think that is too much, but then again there are not that much lists, so maybe it's worth the work.

Ok, here is my proposal for plurals, I have some ideas, I think the safest/most complete would be to have a file tracking generated plurals and tell the user of the script to check newly generated plurals then add them to git, this would also allow reviewers to see new plurals generated. I'll put the updated revision of my idea soon.

Another idea without tracking plurals would be to show the list of generated word -> plural but I think it would add noise over time and wouldn't add value, but if we don't want to add a plurals.txt file in git it would be a possibility.

And again, I don't really understand if we are allowed or not to pull in a dependency such as pluralizer or inflect, this would be another idea

generate plurals, for now supporting -y ending (-y to -ies/-ys), track generated plurals and show new ones to be checked

MyDeveloperDay added inline comments.Aug 30 2021, 9:09 AM
clang/docs/tools/dump_format_style.py
33

This failed for me with invalid syntax

FederAndInk added inline comments.Aug 30 2021, 9:25 AM
clang/docs/tools/dump_format_style.py
33

Oh, ok, sorry, I might be using to recent python features, I'll remove type specifier, what is the recommended python version to use?

And again, I don't really understand if we are allowed or not to pull in a dependency such as pluralizer or inflect, this would be another idea

My Python knowledge is very limited, but if it runs out of the box, or with very limited required user action with a reasonably new Python (if Python 2 can be reasonably new is another question), I think it would be fine.

clang/docs/tools/dump_format_style.py
10

I think these should be sorted.

18

So you would add a plurals.txt in git and make the change visible through git diff? What about just reordering? I.e. Strings is on line 2, but after a change in line 1. Maybe sort the output?

I'm not against this procedure, but also not in favor. :)

33

Oh, ok, sorry, I might be using to recent python features, I'll remove type specifier, what is the recommended python version to use?

That I can not answer. I run

$ python --version
Python 3.9.5
FederAndInk marked an inline comment as done.Aug 30 2021, 12:06 PM
FederAndInk added inline comments.
clang/docs/tools/dump_format_style.py
33

Ok thanks, this is a reasonably recent version, I think we might want to explicitly specify python3 in the script to avoid using python2, I'll upload the diffs immediately.

FederAndInk marked 2 inline comments as done.Aug 30 2021, 12:21 PM
FederAndInk added inline comments.
clang/docs/tools/dump_format_style.py
10

ok, it will be done

18

This line is used to restore the version of plurals.txt to HEAD, so when calling the script multiple times, it keeps showing new plurals until plurals.txt gets committed.

So you would add a plurals.txt in git and make the change visible through git diff?

yes, that's it

What about just reordering?

I don't think we want ordering, it is ordered from first plural generated to last/new one, so git diff will only show new plurals

FederAndInk marked an inline comment as done.

add common plural rules, use python3 explicitly, reorder imports

MyDeveloperDay added inline comments.Aug 31 2021, 1:23 AM
clang/docs/tools/dump_format_style.py
10

are these standard imports or are we going to have to pip install something?

18

I'm personally not in favour of this script calling back to git

error: pathspec './plurals.txt' did not match any file(s) known to git
Traceback (most recent call last):
  File "./dump_format_style.py", line 18, in <module>
    subprocess.check_call(['git', 'checkout', '--', PLURAL_FILE])
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'checkout', '--', './plurals.txt']' returned non-zero exit status 1.
error: pathspec './plurals.txt' did not match any file(s) known to git
Traceback (most recent call last):
  File "./dump_format_style.py", line 18, in <module>
    subprocess.check_call(['git', 'checkout', '--', PLURAL_FILE])
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'checkout', '--', './plurals.txt']' returned non-zero exit status 1.

This is expected if plurals.txt is not in git yet, there is call that will do nothing on errors, but I use check_call to report errors from git checkout -- plurals.txt. To test, you can either replace temporarily the check_call by a call in clang/docs/tools/dump_format_style.py:18 or get the commit/create a temporary commit with plurals.txt

Maybe to simplify the testing/review procedure, I'll change check_call by call and if it is accepted, change it back to the checked version.

clang/docs/tools/dump_format_style.py
10

These are all standard imports, no worries :) (https://docs.python.org/3/library/subprocess.html)

otherwise, I would have directly used python inflect/or any other package to solve the plural problem

18

Well, I was inspired by other python scripts in the llvm-project repo that use subprocess to call git, it just touches the plurals.txt files to allow the user calling the script multiple times and be warned of the new plurals each time until it is committed. We could do without it, but we would lose a part of automation and the user of the script would have to partly manage the plurals.txt file.

A semi-solution I see is to at least tell the user to use git checkout -- plurals.txt if they want to clean up plurals/regenerate them and see which ones are new and/or call git diff -- plurals.txt to show new plurals which may be less "problematic" than git checkout -- plurals.txt. What do you think?

Tl;dr of solutions:

  1. reconsider as this technic is in use in other scripts in llvm-project
  2. call git diff instead of git checkout leading to less consistent and precise messages
  3. just tell the user what are their options (printing 'you can use git checkout -- plurals.txt or git diff -- plurals.txt'...)
  4. other ideas?

use correct python assignment from tuple, ask the user if they want to invoke git, use call instead of check_call to allow testing

From my point we can try that one, if there are problems we have plenty of time to revert it.

FederAndInk added inline comments.Sep 4 2021, 1:03 PM
clang/docs/tools/dump_format_style.py
21–25

Just to let you know, I've made modifications to be able to test the code with an untracked plurals file. We might want stricter checks by calling check_call instead, also we won't need line 22 above.

MyDeveloperDay requested changes to this revision.EditedSep 4 2021, 11:56 PM

I think interacting with git or even blocking for input doesn’t feel right

I have an issue out on llvm-premerge-checks that would run this tool and that couldn’t block like this

This revision now requires changes to proceed.Sep 4 2021, 11:56 PM

For me it feels enough to output to stderr that we are missing plurals and what they are for, it’s not like you can say what they should be is it? This will happen so infrequently that it’s not worth the environment issues that checking git will cause.

remove git invocation and user input, tell the user what the plurals are used for

FederAndInk marked 8 inline comments as done.Sep 5 2021, 7:14 AM

Ok, so I removed git invocation and I tell the user what are their options, at least warnings are emitted for new plurals, and they will be shown in git status/diff and in revisions.

MyDeveloperDay accepted this revision.Sep 6 2021, 12:30 AM
This revision is now accepted and ready to land.Sep 6 2021, 12:30 AM
MyDeveloperDay added a comment.EditedSep 6 2021, 2:59 AM

Could you just rebase, I'm not getting a clean merge of ClangFormatStyleOptions.rst

you are missing the changes from D108752: [clang-format] Group options that pack constructor initializers

MyDeveloperDay requested changes to this revision.Sep 6 2021, 3:03 AM
This revision now requires changes to proceed.Sep 6 2021, 3:03 AM

rebase main

MyDeveloperDay accepted this revision.Sep 6 2021, 4:44 AM
This revision is now accepted and ready to land.Sep 6 2021, 4:44 AM
MyDeveloperDay added inline comments.Sep 6 2021, 5:36 AM
clang/docs/tools/dump_format_style.py
18

printing this is just unnecessary please remove

FederAndInk added inline comments.Sep 6 2021, 5:38 AM
clang/docs/tools/dump_format_style.py
18

Ok, should I also remove line 18?

remove one print l17, tell me if we don't want the second print (now l17)

fixed typo in the script coment -> comment

I still think the printing is unnecessary, tell me when the plurals will be different otherwise keep quiet? otherwise, it confuses people making them think they have to do something

only print info on generated plurals if there is a new plural

@MyDeveloperDay is it all good? I'm sorry for all the misunderstanding,

Thanks for the merge, I now understand more how this all works and what we want from these scripts.

Also, don't we want to change the title and summary of the commit/revision? Because it does not reflect the real changes now in the repo

FederAndInk retitled this revision from [docs] Fix documentation of clang-format BasedOnStyle type to [docs][clang-format] use yaml types style.Sep 24 2021, 3:15 AM
FederAndInk edited the summary of this revision. (Show Details)