The format includes a ClangMinimumVersion entry to avoid a potential
source of subtle errors if an older version of Clang were to be used
with a multilib.yaml that requires a newer Clang to work correctly.
This feature is comparable to CMake's cmake_minimum_required.
Details
- Reviewers
- phosek - peter.smith 
- Commits
- rG4794bdab7aed: [Driver] Multilib YAML parsing
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| clang/lib/Driver/Multilib.cpp | ||
|---|---|---|
| 93 | Please use the LLVM regular expression engine (llvm/Support/Regex.h) instead of std::regex. It's more portable, e.g., I've seen bugs and performance issues with MSVC implementation that don't manifest on Linux. | |
Move the regular expression flag matching functionality into MutlilibSet. This (a) simplifies the code for loading from multilib.yaml; and (b) makes the flag matching functionality easier to access.
Do you know if regular expressions are necessary to cover the existing use cases? In our experience, while regular expressions are powerful, they also tend to be error prone and more difficult to reason about. Would glob patterns that are implemented by https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/GlobPattern.h be sufficient?
Yes. One use case is mapping later architecture versions to the last architecture version supported by the multilib. For example, this would map armv8 versions after 8.5 to 8.5:
- regex: target=armv8\.([6-9]|[1-9][0-9]+)-none-eabi matchFlags: [target=armv8.5-none-eabi]
For the existing YAML files within LLVM, we most commonly used the PascalCase for field names and less commonly lisp-case. This change uses camelCase which is unusual and unless there's a particular reason for using this scheme such as compatibility with an existing format, I'd prefer keeping the format consistent with the rest of LLVM and using PascalCase or lisp-case.
I've set approved from the Arm side. Please leave some time for people in the US time zone to leave any final comments or ask for extensions. I've left some comments that can be applied prior to commit if you want to take them up.
| clang/include/clang/Driver/Multilib.h | ||
|---|---|---|
| 63 | With the context of https://discourse.llvm.org/t/rfc-multilib/67494 and potential other formats. I think it will be worth making this comment not specific to multilib.yaml. For example if there is another non-YAML DSL that has tags. For example multilib DSL rather than multilib.yaml | |
| 135 | With reference to https://discourse.llvm.org/t/rfc-multilib/67494 perhaps rename this to parseMultilibYaml as parse is generic, yet there is no indication it is working on a yaml file here. | |
parse -> parseYaml
| clang/include/clang/Driver/Multilib.h | ||
|---|---|---|
| 63 | If I were a newcomer to this code I think I'd find it easier to grep for multilib.yaml so on that basis I'd prefer to stick with the current wording. Should another file format be adopted then the comment could be updated at that point. | |
| 135 | Agreed. Renamed to parseYaml. | |
| clang/lib/Driver/Multilib.cpp | ||
|---|---|---|
| 152 | Can we use a separate versioning scheme rather than tying the version to Clang? That is start at version 1.0 and bump it every time we make backwards incompatible change. That's the strategy we've for YAML formats used in other tools such as llvm-ifs. The reason I think that would be preferable is that Clang will be at version N.0.0 for ~9 months; we bump the version when we branch of the release, then spent 6 months developing and then typically another 3 months stabilizing the release branch. So for example, we bump the version to 16.0.0 in July 2022, and that version will get released in March 2023. However, during those 9 months, we may want to make more than one breaking change to the format. This may not be an issue for projects that only follow stable releases, but there's plenty of major projects that live at HEAD, such as Android, Chromium or Fuchsia, for which this could cause an issue. Using a dedicated versioning scheme allows us to decouple the format evolution from Clang releases. | |
| 152 | Use llvm::VersionTuple rather than std::string which is intended for this purpose. | |
| clang/include/clang/Driver/Multilib.h | ||
|---|---|---|
| 135–137 | Rather than parseYaml modifying the state of MultilibSet, can we instead have a static factory method or a standalone function that parses, constructs and returns llvm::Expected<MultilibSet>. That's more in line with how we handle YAML deserialization elsewhere in LLVM, and also follows the notion of MultilibSet being immutable. | |
| clang/lib/Driver/Multilib.cpp | ||
| 171 | This is just a nit, but the error messages in this file are inconsistent. Sometimes they are full sentences (capitalized, ending with a dot), sometimes they are not (not-capitalized, no dot at the end) and in this particular case it's a mixture (not-capitalized, ending with a dot, although it seems like dot here is only used as a separator). I don't have a strong preference for one style over the other, but I think we should pick one and be consistent. | |
| clang/include/clang/Driver/Multilib.h | ||
|---|---|---|
| 135–137 | Done, although I used ErrorOr instead of Expected to allow passing the error directly from the yaml::Input object. | |
The only remaining concern I have is about the YAML schema, specifically the naming of Regex, MatchFlags and NoMatchFlags fields. Specifically, I think it something like this might be cleaner:
FlagMap: - Match: --target=thumbv8(\.[0-9]+)?m\.base-none-unknown-eabi Flags: [--target=thumbv6m-none-unknown-eabi]
The fact that the value is a regular expression is implied by the name of the field Match since the term "matching operation" is commonly used with regular expression. For the negative case we could use NoMatch.
I'm open to other ideas though, this is just a suggestion.
@phosek thanks for your suggestion, that's now implemented. In practise for LLVM Embedded Toolchain for Arm we haven't yet needed NoMatchFlags so I've removed that feature.
Great, that simplifies things even further. The only other suggestion I have is to maybe rename FlagMap to Mappings for a better consistency with Variants, otherwise LGTM.
With the context of https://discourse.llvm.org/t/rfc-multilib/67494 and potential other formats. I think it will be worth making this comment not specific to multilib.yaml. For example if there is another non-YAML DSL that has tags.
For example multilib DSL rather than multilib.yaml