This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Add support for formatting Verilog code
AbandonedPublic

Authored by sstwcw on Mar 15 2022, 4:33 PM.

Details

Summary

Because Verilog uses the backtick instead of hash for preprocessor
directives, we replaced is(tok::hash) with a new function isPPHash.

UnwrappedLineParser::parseBlock has more flags, so we used a bit
field.

We need to lex the backtick, and the lexer had problems with that. So
we modified how the lexer looks for whitespace.

Diff Detail

Event Timeline

sstwcw created this revision.Mar 15 2022, 4:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2022, 4:33 PM
Herald added a subscriber: mgorny. · View Herald Transcript
sstwcw requested review of this revision.Mar 15 2022, 4:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2022, 4:33 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
sstwcw added a subscriber: svenvh.Mar 15 2022, 4:38 PM

Do we have people who use Verilog and knows the clang-format code base? @svenvh Your email address looks like you work for a hardware company. If you know Verilog would you please have a look at this patch?

sstwcw added inline comments.Mar 15 2022, 4:52 PM
clang/lib/Format/UnwrappedLineParser.cpp
534

One of the people in charge said multiplying with a boolean might trigger warnings. Here I compiled with gcc. This version doesn't trigger warnings. The other way to do it, CanContainBracedList ? BLOCK_CAN_CONTAIN_BRACED_LIST : 0, triggers a warning that I shouldn't mix enum and integer.

Somehow the bot can't create a branch when trying to merge this patch. The error is failure to lock. The error persists after restarting the process. Does anyone know why?

Firstly thank you for taking the time to submit the patch, I've only taken a cursory view at first and I definitely have some comments, I've added the main clang-format reviewers

This is a very invasive patch so there are a number of things we might want to do to limit its size.

To start you off lets do some housekeeping..

  1. Can you please add a release note
  2. Can you add the changes to the ClangFormat.rst
  3. Can you log a github issue and mention it in the summary
  4. This needs a better summary explaining the rational for adding Verilog and what style guide we are following
MyDeveloperDay requested changes to this revision.Mar 16 2022, 1:18 AM
This revision now requires changes to proceed.Mar 16 2022, 1:18 AM
MyDeveloperDay added inline comments.Mar 16 2022, 1:47 AM
clang/docs/ClangFormatStyleOptions.rst
2241

I don't feel "InstancePorts" is clear here I assume this is a Verilog thing, if we are going to have Verilog specific options I think they should be prefixed with the name

VerilogBreakBetweenInstancePorts

3207

Can we have a hyperlink? to what we are following

clang/include/clang/Format/Format.h
1687

I'd like to see this be a general option that could be used for all languages

4230

This is part of in my view a different change which could be separate, I've seen this logged before

clang/lib/Format/ContinuationIndenter.cpp
1043

Could you point me to the "unit test" that covers this case

clang/lib/Format/Format.cpp
697

given the size of your patch I'd break these kinds of unrelated changes out we can do an NFC patch

If you don't have commit access you need to get that first. I doubt someone else is going to land this on your behalf, you need to own it and you need to be committed to supporting this longer term (are you?)

1210

Ditto above

1213

I'm not going to lie I don't like the name "InstancePorts" it doesn't tell me what its doing.. is this a known technical term?

1266

can we have isVerilog() I hate all these == !=

3408

This is in my view an unrelated change break it out

clang/lib/Format/FormatToken.h
62

I feel this is a separate patch

80

comment?

115

is this comment always correct?

924

final is a CSHARP and C++ keyword too

1029

and is a C++ keyword right?

MyDeveloperDay added inline comments.Mar 16 2022, 2:14 AM
clang/lib/Format/ContinuationIndenter.cpp
1373

I'm not a fan of these isPPHash changes as I think it makes the impact on the other code much greater

I think it would be better to handle the backtick case completely separately, in my experience as soon as you try to represent 2 things as one you are in trouble, my developer gut says there is something wrong here even if it works 99% of the time the last 1% will kill you.

clang/lib/Format/FormatToken.h
723

Could you show the unit test that covers this, does it impact C++ at all?

737

unrelated change

942

I feel like you are overloading a different level of classification here as to the "type of function" this just makes this structure now massive and hard to read.

1153

What standard are we following here? for example I notice noshowcancelled is not covered..

despite me actually doing some VHDL during my degress, I no nothing about this, just want to understand what we should be covering.

I guess alot of these words are not unit tested right?

1177

this feels like mixed metaphors as I said above

1192

I'm kind of wondering why this is needed (can you point me to the unit tests) because doesn't javascript already support this kind of interpolated string?

1324

Does this need to be VerilogSpecific?

1468
clang/lib/Format/FormatTokenLexer.cpp
84

unrelated change commit separately to reduce patch size

471

year really not nice.. its like we can never use tok::hash again!

513

This change could be pull out separately to reduce the size of this patch

929

This is an unrelated change surely... and you point me to some unit tests for this

1175

This is only for Verilog right? I think we should make that clear readRawTokenVerilogSpecific

clang/lib/Format/TokenAnnotator.cpp
736

Could this impact any of the non C++ languages like C#

1195

To ease the pain of such a massive patch you could have rolled this change in first, switching from the if's to a switch is a good change but you could massively reduce the impact of the patch by doing this in phases

1416

Did other languages creep in?

1869

is this something verilog specific? what impact does this have on https://en.cppreference.com/w/cpp/keyword/not

2001

probably could be a separate patch

clang/lib/Format/TokenAnnotator.h
46

I feel this might be an unrelated change, is this something that needed for something specific, could you point to the unit tests

MyDeveloperDay added a comment.EditedMar 16 2022, 2:22 AM

I'm very much a "clang-format all the things" kind of person which is why I added C# and JSON support, as someone who moves between languages all day every day I want one tool to rule them all.. I'd love it if it would format YAML (but thats another story)..

For me this is a noble patch, but its just WAY too big, I'm really struggling to get through even half of it, and I've got my super xray specs on it because I think when I added C# support I did it in smaller phases and I didn't have full support out of the box, (and others came in and added it later), I feel the smaller patches can be reviewed more easily even if the full language support isn't quite there from day one.

This looks like a thorough implementation, but along the way you've changed things that possibly annoyed you, which I think are somewhat unrelated to adding verilog (although they made adding it easier), but its just making it patch so hard to get though thoroughly. (and I'm not even the most critical of the reviewers), I think some of the other might eat you alive!! ;-)

but lets me say, my review comment might seem overly negative, I don't intend them that way, I think this could make clang-format highly usable to many new engineers, and this is a noble effort. I'm just not sure how comfortable everyone is going to be about dropping this.

It could have quite an impact on downstream forks.

Do we need to consider other potential styles of Verilog? https://github.com/lowRISC/style-guides/blob/master/VerilogCodingStyle.md & https://cs233.github.io/verilogstyle.html are you building in the ability for Verilog specific items to be formatted in different way? (if that's even a thing)

krasimir added inline comments.Mar 16 2022, 2:47 AM
clang/lib/Format/Format.cpp
3451

I kinda prefer the old style of handling this I find it simpler to read. I'd be OK if getAssumeFilenameHelp didn't print the mapping. Also the suffix array has a bit implicit semantics going on that it will be the first suffix matching, which isn't immediately clear in isolation.
Also it looks like this changes behavior: in the old case, some of the matches were case-insensitive.

You need to rebase the patch, it doesn't apply.

clang/lib/Format/Format.cpp
3451

I agree with @krasimir here,

Actually there isn't good alignment with the clang-format frontend and the library or any of the python or vim plugin scripts either over what extensions are/are not supported..

Very much a separate patch to cover everything in my view.

Also I've seen requests were we should not "guess a language" that isn't in the .clang-format style and that feels kind of related too..

i.e. those that use ObjC only and only have Language: ObjectiveC in their clang-format file probably want extensions mapped to ObjC

clang/lib/Format/UnwrappedLineParser.cpp
534

this is a hard no from me.

559

could be separate small patch

580

I think we are losing clarity here, ParseDefault does more than just parseStructuralElement its inverting the HasOpeningBrace and I just can't tell if the implications here would have knock on effects.

If ParseDefault was suitable here why did the author of ParseDefault not cover that?

4499–4500

doesn't the first line of isVerilogPPDirective say if Language!=Verilog return false, if so is the Language check needed here?

clang/lib/Format/UnwrappedLineParser.h
110

Oh! please no... I can't say how much of my life has been ruined by flags and the various | || & && ~& bugs, I'm sorry I'd rather have a structure and it be clear

clang/unittests/Format/FormatTestUtils.h
22

Can you add a comment as to why I would need to HandleHash=false

This is an enormous patch. @MyDeveloperDay mentioned different places which could be separate patches, but in addition could the adding of the verilog language be split up, so that one can comprehend the patch in a reasonable time?

clang/lib/Format/UnwrappedLineParser.cpp
534

bool should only be used as boolean value, not as integral, even if c++ allows it.

clang/lib/Format/UnwrappedLineParser.h
110

struct would be better, yes.

sstwcw abandoned this revision.Mar 17 2022, 5:32 AM

Abandoned? huh? are you breaking it apart?

Yes. I am surprised that you asked since everyone asked me to break it apart.

Yes. I am surprised that you asked since everyone asked me to break it apart.

Well I was thinking you move the unrelated parts out and then reduce the side of this patch, (but you'll have patch dependencies but then the review will be more manageable for us.)

No one is saying that adding Verilog isn't a good idea. (it is actually as long as its not too invasive), I like @HazardyKnusperkeks suggestion of bringing it in in smaller pieces, I could even see us landing pieces before
it was fully complete. (Like we did with C#, which probably isn't 100% complete either, but we incrementally add it)

If people don't try and format Verilog (which they shouldn't be expecting to then it won't hurt if its not 100% complete.) And from my perspective is seems it NOT possible for you to land Verilog without completely rewriting large
swaths of the seemingly unrelated other code first? is the structure of clang-format so bad that you can't possible work with what is there without having to come in a refactor it all from underneath us?

Bottom line, here is the impression I get... You likely have a downstream fork of clang-format that supports Verilog, and you are trying to upstream it..problem is as you developed it you refactored a lot of things (to some extent for the better in your view), But you didn't add unit tests for those refactoring because frankly you didn't need to it was your local copy.

Now landing those refactoring seems like a good idea to you, but we have to take them on sight unseen that those refactoring are ok, With no unit tests to back up your justification even if that just means bolstering the existing tests with new tests that cover more cases. (your isIf() change is one, I'd be HIGHLY surprised if that didn't cause some sort of regressions for someone somewhere, when I say regression I mean changes, both positive and negative but people call them regression! either way even if we fix stuff..)

I feel like we are trying to be super responsive to your reviews (no seriously I used to have to wait weeks for someone to even have a look!, we have a great team of reviewers and contributors they are highly skilled), You've been an LLVM member in Phabricator for not a month. Some people have multiple years experience here, What are we to do? accept everything on face value without unit tests?

But I'm sorry I feel bad, because at every turn I'm like "Really, are we doing that now? (MACRO is FormatToken)", I hate doing it but I feel like I'm pushing back on every review. But we can't just let stuff in without trying to do a full and through review and that means unit tests, you understand that right?

Don't expect to drive by with this review, some of my reviews took 1-2 years to land, my dedication to stay and fix bugs in the meantime proved to the reviewers that I was serious. It could take this level of time and commitment to gain peoples trust to land such a large patch.

You are clearly highly capable and understand the clang-format code base. We want you as a contributor because you understand it already. But this is a most unusual onboarding, but I feel I personally we have to treat the reviews as I would any other review coming in...which I'm afraid is with scrutiny.