This is an archive of the discontinued LLVM Phabricator instance.

Multilib YAML parsing
ClosedPublic

Authored by michaelplatings on Jan 30 2023, 1:49 PM.

Details

Summary

The format includes a ClangMinimumVersion entry to avoid a potential
source of subtle errors if an older version of Clang were to be used
with a multilib.yaml that requires a newer Clang to work correctly.
This feature is comparable to CMake's cmake_minimum_required.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2023, 1:49 PM
michaelplatings requested review of this revision.Jan 30 2023, 1:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2023, 1:49 PM
miyuki added a subscriber: miyuki.Feb 1 2023, 5:15 AM
miyuki added inline comments.
clang/lib/Driver/Multilib.cpp
93

Please use the LLVM regular expression engine (llvm/Support/Regex.h) instead of std::regex. It's more portable, e.g., I've seen bugs and performance issues with MSVC implementation that don't manifest on Linux.

miyuki added inline comments.Feb 1 2023, 5:20 AM
clang/lib/Driver/Multilib.cpp
94

Please use llvm::find_if(InFlags, ...)

149

I think it would make sense to validate regular expression syntax as well.

Incorporated changes requested by @miyuki

michaelplatings marked 3 inline comments as done.Feb 1 2023, 8:32 AM

Move the regular expression flag matching functionality into MutlilibSet. This (a) simplifies the code for loading from multilib.yaml; and (b) makes the flag matching functionality easier to access.

michaelplatings retitled this revision from Multilib YAML parsing to [NFC] Multilib YAML parsing.Feb 6 2023, 3:55 AM
phosek added a comment.Feb 7 2023, 9:23 PM

Do you know if regular expressions are necessary to cover the existing use cases? In our experience, while regular expressions are powerful, they also tend to be error prone and more difficult to reason about. Would glob patterns that are implemented by https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/GlobPattern.h be sufficient?

Do you know if regular expressions are necessary to cover the existing use cases?

Yes. One use case is mapping later architecture versions to the last architecture version supported by the multilib. For example, this would map armv8 versions after 8.5 to 8.5:

- regex: target=armv8\.([6-9]|[1-9][0-9]+)-none-eabi
  matchFlags: [target=armv8.5-none-eabi]

Make "variants" required

For the existing YAML files within LLVM, we most commonly used the PascalCase for field names and less commonly lisp-case. This change uses camelCase which is unusual and unless there's a particular reason for using this scheme such as compatibility with an existing format, I'd prefer keeping the format consistent with the rest of LLVM and using PascalCase or lisp-case.

michaelplatings added a subscriber: peter.smith.
  • Squash D143763 into this change as suggested by @MaskRay
  • Use PascalCase field names as requested by @phosek
  • Applied improvements to error messages suggested by @peter.smith
  • Removed some braces as requested by @MaskRay
  • Moved serialization structs to anonymous namespace as requested by @MaskRay
michaelplatings retitled this revision from [NFC] Multilib YAML parsing to Multilib YAML parsing.Feb 22 2023, 3:40 AM
michaelplatings edited the summary of this revision. (Show Details)

PrintArgs -> PrintOptions

flags -> tags

I've set approved from the Arm side. Please leave some time for people in the US time zone to leave any final comments or ask for extensions. I've left some comments that can be applied prior to commit if you want to take them up.

clang/include/clang/Driver/Multilib.h
63

With the context of https://discourse.llvm.org/t/rfc-multilib/67494 and potential other formats. I think it will be worth making this comment not specific to multilib.yaml. For example if there is another non-YAML DSL that has tags.

For example multilib DSL rather than multilib.yaml

133

With reference to https://discourse.llvm.org/t/rfc-multilib/67494 perhaps rename this to parseMultilibYaml as parse is generic, yet there is no indication it is working on a yaml file here.

peter.smith accepted this revision.Mar 13 2023, 9:28 AM
This revision is now accepted and ready to land.Mar 13 2023, 9:28 AM
michaelplatings marked 2 inline comments as done.

parse -> parseYaml

clang/include/clang/Driver/Multilib.h
63

If I were a newcomer to this code I think I'd find it easier to grep for multilib.yaml so on that basis I'd prefer to stick with the current wording. Should another file format be adopted then the comment could be updated at that point.

133

Agreed. Renamed to parseYaml.

phosek added inline comments.Mar 13 2023, 9:20 PM
clang/lib/Driver/Multilib.cpp
152

Can we use a separate versioning scheme rather than tying the version to Clang? That is start at version 1.0 and bump it every time we make backwards incompatible change. That's the strategy we've for YAML formats used in other tools such as llvm-ifs.

The reason I think that would be preferable is that Clang will be at version N.0.0 for ~9 months; we bump the version when we branch of the release, then spent 6 months developing and then typically another 3 months stabilizing the release branch. So for example, we bump the version to 16.0.0 in July 2022, and that version will get released in March 2023. However, during those 9 months, we may want to make more than one breaking change to the format.

This may not be an issue for projects that only follow stable releases, but there's plenty of major projects that live at HEAD, such as Android, Chromium or Fuchsia, for which this could cause an issue. Using a dedicated versioning scheme allows us to decouple the format evolution from Clang releases.

152

Use llvm::VersionTuple rather than std::string which is intended for this purpose.

michaelplatings marked 2 inline comments as done.

Decouple multilib versioning scheme from the Clang version

phosek added inline comments.May 23 2023, 1:34 AM
clang/include/clang/Driver/Multilib.h
133–135

Rather than parseYaml modifying the state of MultilibSet, can we instead have a static factory method or a standalone function that parses, constructs and returns llvm::Expected<MultilibSet>. That's more in line with how we handle YAML deserialization elsewhere in LLVM, and also follows the notion of MultilibSet being immutable.

clang/lib/Driver/Multilib.cpp
171

This is just a nit, but the error messages in this file are inconsistent. Sometimes they are full sentences (capitalized, ending with a dot), sometimes they are not (not-capitalized, no dot at the end) and in this particular case it's a mixture (not-capitalized, ending with a dot, although it seems like dot here is only used as a separator). I don't have a strong preference for one style over the other, but I think we should pick one and be consistent.

michaelplatings marked 2 inline comments as done.Jun 6 2023, 10:42 AM
michaelplatings added inline comments.
clang/include/clang/Driver/Multilib.h
133–135

Done, although I used ErrorOr instead of Expected to allow passing the error directly from the yaml::Input object.

michaelplatings marked an inline comment as done.

Update unit tests

The only remaining concern I have is about the YAML schema, specifically the naming of Regex, MatchFlags and NoMatchFlags fields. Specifically, I think it something like this might be cleaner:

FlagMap:
- Match: --target=thumbv8(\.[0-9]+)?m\.base-none-unknown-eabi
  Flags: [--target=thumbv6m-none-unknown-eabi]

The fact that the value is a regular expression is implied by the name of the field Match since the term "matching operation" is commonly used with regular expression. For the negative case we could use NoMatch.

I'm open to other ideas though, this is just a suggestion.

@phosek thanks for your suggestion, that's now implemented. In practise for LLVM Embedded Toolchain for Arm we haven't yet needed NoMatchFlags so I've removed that feature.

phosek accepted this revision.Jun 8 2023, 12:58 AM

@phosek thanks for your suggestion, that's now implemented. In practise for LLVM Embedded Toolchain for Arm we haven't yet needed NoMatchFlags so I've removed that feature.

Great, that simplifies things even further. The only other suggestion I have is to maybe rename FlagMap to Mappings for a better consistency with Variants, otherwise LGTM.

FlagMap->Mappings

This revision was automatically updated to reflect the committed changes.