This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Add basic support for formatting C# files
ClosedPublic

Authored by MyDeveloperDay on Feb 19 2019, 1:33 PM.

Details

Summary

This revision adds basic support for formatting C# files with clang-format, I know the barrier to entry is high here so I'm sending this revision in to test the water as to whether this might be something we'd consider landing.

Tracking in Bugzilla as:
https://bugs.llvm.org/show_bug.cgi?id=40850

Justification:
C# code just looks ugly in comparison to the C++ code in our source tree which is clang-formatted.

I've struggled with Visual Studio reformatting to get a clean and consistent style, I want to format our C# code on saving like I do now for C++ and i want it to have the same style as defined in our .clang-format file, so it consistent as it can be with C++. (Braces/Breaking/Spaces/Indent etc..)

Using clang format without this patch leaves the code in a bad state, sometimes when the BreakStringLiterals is set, it fails to compile.

Mostly the C# is similar to Java, except instead of JavaAnnotations I try to reuse the TT_AttributeSquare.

Almost the most valuable portion is to have a new Language in order to partition the configuration for C# within a common .clang-format file, with the auto detection on the .cs extension. But there are other C# specific styles that could be added later if this is accepted. in particular how { set;get } is formatted.

I'm including an example of its usage Before/After to demonstrate it initial capability.

C# code, formatted with current clang-format

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Reflection;

namespace ConsoleApp1 {
class Program {

  [MainAttribute] static void Main(string[] args) {}

public
  void foo() {
    float f = 1.0;
    int a = (int)f;
    string[] args = new string[1];

    args[a] = "Hello";
  }

public
  string Foo {
    set;
    get;
  }
}

    [ClassAttribute] public class Bar {
public
  Bar(){}

      [MethodAttribute] public bool foo() {
    return true;
  }

  [XmlElement(ElementName = "TaxRate")] public int taxRate;

  [MethodAttribute] internal bool foo() {
    string longstring =
        "VeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongLongLongLong";
    return true;
  }
}

internal class InternalBar {
public
  InternalBar() {}
}

    [TestClass][DeploymentItem("testData")] public class Test {
}
}

Same C# code, formatted with current clang-format with this revision (no .clang-format file, out-of-box formatting)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Reflection;

namespace ConsoleApp1 {
class Program {

  [MainAttribute]
  static void
  Main(string[] args) {}

  public void foo() {
    float f = 1.0;
    int a = (int) f;
    string[] args = new string[1];

    args[a] = "Hello";
  }

  public string Foo {
    set;
    get;
  }
}

[ClassAttribute]
public class Bar {
  public Bar(){}

  [MethodAttribute]
  public bool foo() {
    return true;
  }

  [XmlElement(ElementName = "TaxRate")]
  public int taxRate;

  [MethodAttribute]
  internal bool
  foo() {
    string longstring =
        "VeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongLongLongLong";
    return true;
  }
}

internal class InternalBar {
  public InternalBar() {}
}

[TestClass]
[DeploymentItem("testData")]
public class Test {}
}

Diff Detail

Event Timeline

MyDeveloperDay created this revision.Feb 19 2019, 1:33 PM

Add to release notes

Note to self...Underdevelopment:

  • adding support for => (JSBigArrow) for C# Lambas
  • adding support for $"liternal {string} with arguments"

Increase C# formatting capabilities

  • don't split regions markers across lines
  • lexer support for verbatim string literals
  • support for interpolated string literals (C#6)
  • support for interpolated verbatim string literals
  • support for keyword escaping @enum, @class etc..
    • support for null conditionals

Add additional unit tests to support the above

MyDeveloperDay added a project: Restricted Project.Feb 24 2019, 1:35 AM
MyDeveloperDay edited the summary of this revision. (Show Details)Feb 24 2019, 11:59 PM
MyDeveloperDay added a subscriber: llvm-commits.

Fix a crash running clang-format over large C# code base
Add support for C# Null Coalescing

Remove non related clang-format changes
run git clang-format
Add a Microsoft style (trying to match most closely the more traditional Microsoft style used for C#)
This will allow, CSharp to be added to the config by


Language: CSharp
BasedOnStyle: Microsoft
.....

klimek added inline comments.Mar 20 2019, 3:58 AM
include/clang/Basic/TokenKinds.h
80 ↗(On Diff #190737)

This change looks pretty good, minus the changes outside Format/ :( Do you see any chance to fix this by extending the token merging? It's unlikely that the community will be convinced to add some minor parts of C# to clang proper just to support clang-format.

MyDeveloperDay marked an inline comment as done.Mar 20 2019, 10:02 AM
MyDeveloperDay added inline comments.
include/clang/Basic/TokenKinds.h
80 ↗(On Diff #190737)

I know I'm a little nervous about having this code out where it might break the actual compiler

Before I got stuck with how C# does its verbatim and interpolated strings especially when \ is in a string just before the "

let me go back and see if I can't do this with merging the tokens now I know a little more

Addressing Review Comments

  • Remove all non Format only code (i.e. changes to clang/Basic and clang/Lexer to support verbatim interpolated strings C# 6)
  • For the most part the need to for them is removed by merging of tokens during the FormatLexer phase
  • There is one obscure failure scenario, but for a first pass I think it can be ignored (tests commented out) where a \ is used unescaped just before a " (because of paths can be more common than you think)
    • string s = @"ABC\" + ToString("B");";

      In this case, the string @"ABC\" is seen as @"ABC\" + ToString(", clang-format then gets confused about where it should put spaces and we end up with
    • string s = @"ABC\" + ToString(" B ");";
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 B=0 BK=0 P=0 Name=identifier L=6 PPK=2 FakeLParens=2/ FakeRParens=0 II=0x252fbe52028 Text='string'
 M=0 C=1 T=StartOfName S=1 B=0 BK=0 P=220 Name=identifier L=8 PPK=2 FakeLParens= FakeRParens=0 II=0x252fbe522d0 Text='s'
 M=0 C=0 T=BinaryOperator S=1 B=0 BK=0 P=22 Name=equal L=10 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='='
 M=0 C=1 T=Unknown S=1 B=0 BK=0 P=22 Name=string_literal L=31 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='@"ABC\" + ToString("'
 M=0 C=0 T=Unknown S=1 B=0 BK=0 P=23 Name=identifier L=33 PPK=2 FakeLParens= FakeRParens=0 II=0x252fbe52300 Text='B'
 M=0 C=1 T=Unknown S=1 B=0 BK=0 P=23 Name=string_literal L=38 PPK=2 FakeLParens= FakeRParens=1 II=0x0 Text='");"'
 M=0 C=0 T=Unknown S=0 B=0 BK=0 P=23 Name=semi L=39 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
klimek accepted this revision.Mar 21 2019, 3:26 AM

LG. This looks pretty sharp (scnr).

clang/lib/Format/FormatTokenLexer.cpp
177

Tiny nit: please try to keep comments formatted as sentence (upper-case first letter, "." in the end).

clang/unittests/Format/FormatTestCSharp.cpp
19

If everything's static, you don't need a test fixture at all, you can just make it plain functions and use TEST() instead of TEST_F().

This revision is now accepted and ready to land.Mar 21 2019, 3:26 AM
MyDeveloperDay marked 3 inline comments as done.

Address comment nits

clang/unittests/Format/FormatTestCSharp.cpp
19

I was actually just following the existing style of the other tests, but looking into what you said it seems we could almost do this in the other tests too.

This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Mar 22 2019, 11:41 AM
thakis added inline comments.
docs/ReleaseNotes.rst
169 ↗(On Diff #191669)

Shouldn't this be in 'clang-format', not in 'AST Matchers'?

RKSimon added inline comments.
lib/Format/FormatTokenLexer.cpp
249 ↗(On Diff #191669)

@MyDeveloperDay Should this be

Identifier->Type = Question->Type;

Reported in https://www.viva64.com/en/b/0629/

MyDeveloperDay marked an inline comment as done.May 1 2019, 4:43 AM
MyDeveloperDay added inline comments.
lib/Format/FormatTokenLexer.cpp
249 ↗(On Diff #191669)

I think most likely I want to keep it as an identifier so we are treating arg? as arg

I feel like this line is probably not needed, there is a review for this.. D61281: [clang-format] Fixed self assignment