Page MenuHomePhabricator

clang-format: support external styles
Needs ReviewPublic

Authored by Typz on Aug 1 2018, 9:50 AM.

Details

Summary

This patch allows defining external styles, which can be used exactly
like the embedded styles (llvm, google, mozilla...).

These styles are clang-format files installed either systemwide in
/usr/local/share/clang-format, or per-user in ~/.local/share/clang-
format. These can be specified by simply using the name of the file,
and clang-format will search the directories for the style:

clang-format -style=foo-1.0

The patch also allows loading specifying a file name directly, either
relative or absolute:

clang-format -style=/home/clang-format-styles/foo-1.0
clang-format -style=styles/foo-1.0

This works also in BaseOnStyle field, which allows defining compagny-
wide (and possibly versionned) clang-format styles, without having to
maintain many copies in each repository: each repository will simply
need to store a short .clang-format, which simply references the
compagny-wide style.

The drawback is that these style need to be installed on each computer,
but this may be automated through an OS package. In any case, the error
cannot be ignored, as the user will be presented with an error message
if he does not have the style.

NOTE: this may be further improved by also allowing URL (http://, git://...) in this field, which would allow clang-format to automatically download the missing styles.

Diff Detail

Event Timeline

Typz created this revision.Aug 1 2018, 9:50 AM

Doesn’t clang-format support project-based .clang-format or _clang-format? How do they play with this diff?

Typz added a comment.Sep 4 2018, 8:51 AM

clang-format does indeed support .clang-format, which is great for *isolated* projects, and which is not affected by this patch.

This patch addresses the issue of *centralizing* the definition of styles, e.g. allowing individual projects to reference externally defined styles. This is useful when deploying a coding styles compagny-wide, to avoid copying the configuration in each project (and later having problem issues to maintain the styles or check if the correct style is indeed used). Another way of seeing it is that it allows extending the styles which are hard-coded in clang-format (llvm, google, ...)

I don't understand the use-case this patch is realizing. Suppose I have a bunch of projects that have to share a format style. Then I can checkout them under a common root directory and put an appropriate .clang-format file there.

Typz added a comment.Sep 25 2018, 1:50 AM

I don't understand the use-case this patch is realizing. Suppose I have a bunch of projects that have to share a format style. Then I can checkout them under a common root directory and put an appropriate .clang-format file there.

You can do this manually, but this would not be in source-control and not "automatic" : you need to remember to copy the file in the root directory, and be sure to copy the correct one.
Also, this manual copy approach does not support updates very well.

We typically use large workspaces with many repositories (using git-repo), containing both open-source projects (hopefull with their own .clang-format, but more often not without any) and in-house projects.
Putting a .clang-format in the root directory would make it the default not only for our projects, but also for all OSS projects with no .clang-format, where it likely is not correct.
In addition, due to historical or regulatory reasons we unfortunately have multiple coding rules for the same language, in different projects : so we cannot just put at the root directory.

Allowing to implement "custom" styles outside of clang-format fixes this, by allowing to define this style globally; and then simpy reference the style in the project.
We can then also version the styles, e.g. create myStyle-1.0 and myStyle-1.1 to handle changes and let projects upgrade to the latest version as needed, and possibly create links to allow "tracking" the latest revision of a style (e.g. myStyle-latest is a symlink which points to myStyle-1.1).

The idea here does seem to be a natural extension of -style, at least for the case when the arg is a filename directly. I'm not opposed, happy to review this.

I do want to probe the use case a bit though: we have found that configuring via -style= doesn't scale that well to lots of codebases and tools. e.g. if someone's running clang-tidy or clangd via some driver, are they going to have a way to plumb through the right -style=?

Being able to discover the right style from the filesystem is powerful, and if I was going to use this flag, I'd consider symlinking the style-file to subproject/.clang_format instead. That way the setting is persistent and available to all tools, rather than us needing to get it right on each invocation.

lib/Format/Format.cpp
1036

I'm not sure these prefixes are a good idea - can you explain what the purpose is, and why the simpler model of just using the path doesn't work?

Certainly this needs some more thought about how it would work on windows etc. I'd suggest limiting this patch to filenames.

Typz added a comment.EditedOct 29 2018, 10:28 AM

The idea here does seem to be a natural extension of -style, at least for the case when the arg is a filename directly. I'm not opposed, happy to review this.

I do want to probe the use case a bit though: we have found that configuring via -style= doesn't scale that well to lots of codebases and tools. e.g. if someone's running clang-tidy or clangd via some driver, are they going to have a way to plumb through the right -style=?

Being able to discover the right style from the filesystem is powerful, and if I was going to use this flag, I'd consider symlinking the style-file to subproject/.clang_format instead. That way the setting is persistent and available to all tools, rather than us needing to get it right on each invocation.

I totally agree: using '-style' does not scale well. However this patch works also when the style is used in the 'BaseOnStyle' tag. This is actually the way we expect to use this: store a small .clang_format file in each repo, which simply references a clang-format file which must be installed (manually, for now) on each user's computer.

The idea here does seem to be a natural extension of -style, at least for the case when the arg is a filename directly. I'm not opposed, happy to review this.

I do want to probe the use case a bit though: we have found that configuring via -style= doesn't scale that well to lots of codebases and tools. e.g. if someone's running clang-tidy or clangd via some driver, are they going to have a way to plumb through the right -style=?

Being able to discover the right style from the filesystem is powerful, and if I was going to use this flag, I'd consider symlinking the style-file to subproject/.clang_format instead. That way the setting is persistent and available to all tools, rather than us needing to get it right on each invocation.

I totally agree: using '-style' does not scale well. However this patch works also when the style is used in the 'basedOn' tag. This is actually the way we expect to use this: store a small .clang_format file in each repo, which simply references a clang-format file which must be installed (manually, for now) on each user's computer.

BasedOn: "/path/to/some/file" would certainly work; but why not just use a symlink?

Typz added a comment.Oct 29 2018, 10:41 AM

Being able to discover the right style from the filesystem is powerful, and if I was going to use this flag, I'd consider symlinking the style-file to subproject/.clang_format instead. That way the setting is persistent and available to all tools, rather than us needing to get it right on each invocation.

I did not think of this solution. However:

  • I am not sure how it works when the .clang-format link is stored in SCM: not sure git will allow storing a link which points outside of the repo... But it may be worth trying, it could be an alternative
  • Making a symlink will hardcode the full path of the style, which may not suit different users: maybe in some cases the styles could be installed per-user (as users do not have root permissions), which they would be installed system-wide (in containers used for CI builds, where we may not know which user will actually run the build)
  • Making a symlink will definitely not be portable to different OS : esp. on Windows, the path would probably not be the same
  • I am not sure a symlink can point to a "~"-relative path, and automatically perform the user HOME path resolution
lib/Format/Format.cpp
1036

the goal is actually to store these "base" styles in some global folder, so that multiple projects can reference it (through their own .clang-format file), without having to make any copy.

The idea is that the project is under SCM, and simply includes a reference to the style: in its own .clang-format, which would ideally only contain a single basedOn tag. The styles are defined globally, then each project only indicates which styles in uses.

But indeed, the 'HOME' folder on windows is probably not this one...

ping?

Sorry for losing track of this.

I think -style=<filename> is a logical extension of the current options. I'm less sure of supporting it in BasedOnStyle, but happy to go either way.

Referring to styles by names in installed style directories seems like a relatively marginal feature that would need a more careful design, e.g.:

  • respecting platform conventions
  • supporting build-time configuration
  • understanding how distro packaging is going to work
  • thinking about addressing the resulting fragmentation as .clang-format files are shared but the referenced files are not

Within a tightly controlled organization, these things are less of an issue. We've had luck simply making local changes to support different styles for these scenarios, though that's not ideal.
One possibility to reduce the scope here: search for unknown names on under $CLANG_FORMAT_STYLES if the variable is set. That way it can be set by administrators within an org if appropriate, but clang-format doesn't have to have opinions about the policy here, and all binaries still behave the same.

lib/Basic/VirtualFileSystem.cpp
288 ↗(On Diff #158559)

(this change isn't a good idea - if you want to expand tilde call fs::expand_tilde - it's not related to VFS)

Typz added a comment.Thu, May 23, 12:39 AM

ping?

Sorry for losing track of this.

I think -style=<filename> is a logical extension of the current options. I'm less sure of supporting it in BasedOnStyle, but happy to go either way.

To keep traceability and ensure consistency, storing a clang-format file in each repository is still required. So BasedOnStyle is actually the feature we need, and from our POV the -style=<filename> extension is a side effect :-)

Referring to styles by names in installed style directories seems like a relatively marginal feature that would need a more careful design, e.g.:

  • respecting platform conventions

I knew this one was coming: the patch is clearly not complete on this aspect, but I pushed it already to get an early feedback on the generic goal/approach.
This definitely needs some fixing, and to be honest I was hoping there was already something in LLVM code base on this topic (e.g. list of standard path, access to base installation path...) : I tried to look. but did not find anything yet. Any advice?

  • supporting build-time configuration

I thought of injecting the platform-specific path list through the build sytem as well. And this would allow overriding it with any other path list as appropriate.

  • understanding how distro packaging is going to work

I don't understand what you mean exactly. With build-time configuration, the package can be customized to look in the appropriate places for the specific distribution.

Can you please clarify the issues you see?

  • thinking about addressing the resulting fragmentation as .clang-format files are shared but the referenced files are not Within a tightly controlled organization, these things are less of an issue. We've had luck simply making local changes to support different styles for these scenarios, though that's not ideal.

Our organization is not so controlled, but we want to ease the deployment and maintenance of the styles, hence this approach.
(by the way, ideally I would like to eventually further extend this patch to support retrieval of external styles through url : e.g. to get style from a central server, through http, git....)

One possibility to reduce the scope here: search for unknown names on under $CLANG_FORMAT_STYLES if the variable is set. That way it can be set by administrators within an org if appropriate, but clang-format doesn't have to have opinions about the policy here, and all binaries still behave the same.

I don't think having no search path by default (if the env var does not exist or is empty) is so nice, but I can definitely add such an environment variable override.

Herald added a project: Restricted Project. · View Herald TranscriptThu, May 23, 12:39 AM
Typz updated this revision to Diff 201693.Tue, May 28, 9:37 AM
Typz marked 3 inline comments as done.
  • Rebased
  • Adapt styles search path to platform conventions
  • Allow customizing search path at compile time
  • Allow overriding search path at runtime through CLANG_FORMAT_STYLES_PATH env variable

One thing that's unclear to me is whether your aim is to

  1. solve a concrete problem for your organization
  2. solve a family of problems for similar organizations
  3. add a new way of configuring styles for many types of users/projects

If it's 1) I think this is very reasonable and we should focus on finding the simplest mechanism that will work. Is it possible to always set an environment variable, or a build-time parameter, or install the style file in a well-known absolute path (is there really a mix of win/unix users?)
If it's 2), I think what's missing is evidence that other orgs have similar problems and requirements.
Option 3) is interesting, but much more ambitious and needs to start with a design doc that discusses the use cases, alternatives, and implications. Certainly anything involving fetching styles via git/http falls into this bucket, I think.

@klimek in case I'm way off base here, or there have been past discussions that shed light on this.

With that in mind, I'd be very happy to approve the build time config and/or an env variable, as long as they're off by default. It's easy to turn them on later, but not easy to turn them off.
If they're going to be on by default, I think we need a strong reason.

  • respecting platform conventions

I knew this one was coming: the patch is clearly not complete on this aspect, but I pushed it already to get an early feedback on the generic goal/approach.
This definitely needs some fixing, and to be honest I was hoping there was already something in LLVM code base on this topic (e.g. list of standard path, access to base installation path...) : I tried to look. but did not find anything yet. Any advice?

There's lots of code in Driver that manipulates search paths in platform-specific ways :-) Most of my experience with that urges me to avoid it if at all possible, and otherwise keep it very simple.

  • understanding how distro packaging is going to work

I don't understand what you mean exactly. With build-time configuration, the package can be customized to look in the appropriate places for the specific distribution.

Can you please clarify the issues you see?

There's a mechanism, but how is it to be used? Will/should projects with a style guide provide style packages for distros? Or should these be part of the "official" clang-format package?
If separate packages exist, how much is it going to confuse users that clang-format will silently format the same project with a .clang-format file different ways depending on what's installed?

(by the way, ideally I would like to eventually further extend this patch to support retrieval of external styles through url : e.g. to get style from a central server, through http, git....)

One possibility to reduce the scope here: search for unknown names on under $CLANG_FORMAT_STYLES if the variable is set. That way it can be set by administrators within an org if appropriate, but clang-format doesn't have to have opinions about the policy here, and all binaries still behave the same.

I don't think having no search path by default (if the env var does not exist or is empty) is so nice, but I can definitely add such an environment variable override.

lib/Format/Format.cpp
1080

This fallback silently changes callers of getPredefinedStyle() with no FS to read from the real filesystem (if the name doesn't match anything).
This seems bad for embedders, and it's not obvious what their workaround is. I'd suggest we either want to change the signature (e.g. by not having a default param) and force them to choose, or make nullptr mean no FS and document the inconsistency with getStyle().

Typz added a comment.Fri, Jun 7, 3:14 AM

One thing that's unclear to me is whether your aim is to

  1. solve a concrete problem for your organization
  2. solve a family of problems for similar organizations
  3. add a new way of configuring styles for many types of users/projects

First and forehand, I have a problem to solve in my organization : we have many projects, and maintaining the config file in some many repositories is not practical.
In some case (like, LLVM, google or mozilla), this is not an issue because the formatting rules are hard-coded in clang-format.
In other large organisation (like Qt, KDE or many private compagnies), there may be coding-rules already in place and it is not practical to store the configuration in clang-format.
(This is not so much an issue in small-organisation, with few projects/repositories : each project/repo may manage its own clang-format config file)

The goal is to support this use-case : organisation-wide styles, without patching clang-format.

If it's 1) I think this is very reasonable and we should focus on finding the simplest mechanism that will work. Is it possible to always set an environment variable, or a build-time parameter, or install the style file in a well-known absolute path (is there really a mix of win/unix users?)

In our case, we actually have more than one "standard" style, a combination of various OS (linux, windows, macos), and not a very strong control on user computers. So we cannot rely on a specific file or variable being setup by an administrator.
Therefore, I think the solution should still be generic enough, and there is no reason to restrict it to one use case.

If it's 2), I think what's missing is evidence that other orgs have similar problems and requirements.

I think many orgs have the same issue, but some of them have found a solution by hard-coding their style in clang-format...

Option 3) is interesting, but much more ambitious and needs to start with a design doc that discusses the use cases, alternatives, and implications. Certainly anything involving fetching styles via git/http falls into this bucket, I think.

git/http/... is indeed much more complicated. It would solve the issue of "transparent" setup, but introduces many subtleties : how to cache the result (to handle offline use or downtime issues), how to handle updates....

@klimek in case I'm way off base here, or there have been past discussions that shed light on this.

With that in mind, I'd be very happy to approve the build time config and/or an env variable, as long as they're off by default. It's easy to turn them on later, but not easy to turn them off.
If they're going to be on by default, I think we need a strong reason.

I they are going to be off by default, it means we would still need to patch clang-format to use it, correct ?
I would like to avoid this, as it is quite time consuming; and it would actually be easier in that case to just add yet another hard-coded style in the patched binary...

  • understanding how distro packaging is going to work

There's a mechanism, but how is it to be used? Will/should projects with a style guide provide style packages for distros? Or should these be part of the "official" clang-format package?
If separate packages exist, how much is it going to confuse users that clang-format will silently format the same project with a .clang-format file different ways depending on what's installed?

The goal is to actually separate the styles from clang-format : so I don't see the point to make them part of the official clang-format package.
Usage may be different: the styles may be setup through different packages (e.g. Qt style in qt-core package), installed manually by user, ....
This is surely not perfect, since different packages may indeed provide the same style : technically this is not an issue (packages must just be marked as conflicting), but it is indeed the organisation's responsibility to use different names for the styles...
(to some extent, this may be improved by passing URLs for external styles, e.g. from git ; but this may even be an issue if different mirrors must be used...)