Page MenuHomePhabricator

Multilib YAML parsing

Authored by michaelplatings on Jan 30 2023, 1:49 PM.



The format includes a ClangMinimumVersion entry to avoid a potential
source of subtle errors if an older version of Clang were to be used
with a multilib.yaml that requires a newer Clang to work correctly.
This feature is comparable to CMake's cmake_minimum_required.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2023, 1:49 PM
michaelplatings requested review of this revision.Jan 30 2023, 1:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2023, 1:49 PM
miyuki added a subscriber: miyuki.Feb 1 2023, 5:15 AM
miyuki added inline comments.

Please use the LLVM regular expression engine (llvm/Support/Regex.h) instead of std::regex. It's more portable, e.g., I've seen bugs and performance issues with MSVC implementation that don't manifest on Linux.

miyuki added inline comments.Feb 1 2023, 5:20 AM

Please use llvm::find_if(InFlags, ...)


I think it would make sense to validate regular expression syntax as well.

Incorporated changes requested by @miyuki

michaelplatings marked 3 inline comments as done.Feb 1 2023, 8:32 AM

Move the regular expression flag matching functionality into MutlilibSet. This (a) simplifies the code for loading from multilib.yaml; and (b) makes the flag matching functionality easier to access.

michaelplatings retitled this revision from Multilib YAML parsing to [NFC] Multilib YAML parsing.Feb 6 2023, 3:55 AM
phosek added a comment.Feb 7 2023, 9:23 PM

Do you know if regular expressions are necessary to cover the existing use cases? In our experience, while regular expressions are powerful, they also tend to be error prone and more difficult to reason about. Would glob patterns that are implemented by be sufficient?

Do you know if regular expressions are necessary to cover the existing use cases?

Yes. One use case is mapping later architecture versions to the last architecture version supported by the multilib. For example, this would map armv8 versions after 8.5 to 8.5:

- regex: target=armv8\.([6-9]|[1-9][0-9]+)-none-eabi
  matchFlags: [target=armv8.5-none-eabi]

Make "variants" required

For the existing YAML files within LLVM, we most commonly used the PascalCase for field names and less commonly lisp-case. This change uses camelCase which is unusual and unless there's a particular reason for using this scheme such as compatibility with an existing format, I'd prefer keeping the format consistent with the rest of LLVM and using PascalCase or lisp-case.

michaelplatings added a subscriber: peter.smith.
  • Squash D143763 into this change as suggested by @MaskRay
  • Use PascalCase field names as requested by @phosek
  • Applied improvements to error messages suggested by @peter.smith
  • Removed some braces as requested by @MaskRay
  • Moved serialization structs to anonymous namespace as requested by @MaskRay
michaelplatings retitled this revision from [NFC] Multilib YAML parsing to Multilib YAML parsing.Feb 22 2023, 3:40 AM
michaelplatings edited the summary of this revision. (Show Details)

PrintArgs -> PrintOptions

flags -> tags

I've set approved from the Arm side. Please leave some time for people in the US time zone to leave any final comments or ask for extensions. I've left some comments that can be applied prior to commit if you want to take them up.


With the context of and potential other formats. I think it will be worth making this comment not specific to multilib.yaml. For example if there is another non-YAML DSL that has tags.

For example multilib DSL rather than multilib.yaml


With reference to perhaps rename this to parseMultilibYaml as parse is generic, yet there is no indication it is working on a yaml file here.

peter.smith accepted this revision.Mar 13 2023, 9:28 AM
This revision is now accepted and ready to land.Mar 13 2023, 9:28 AM
michaelplatings marked 2 inline comments as done.

parse -> parseYaml


If I were a newcomer to this code I think I'd find it easier to grep for multilib.yaml so on that basis I'd prefer to stick with the current wording. Should another file format be adopted then the comment could be updated at that point.


Agreed. Renamed to parseYaml.

phosek added inline comments.Mar 13 2023, 9:20 PM

Can we use a separate versioning scheme rather than tying the version to Clang? That is start at version 1.0 and bump it every time we make backwards incompatible change. That's the strategy we've for YAML formats used in other tools such as llvm-ifs.

The reason I think that would be preferable is that Clang will be at version N.0.0 for ~9 months; we bump the version when we branch of the release, then spent 6 months developing and then typically another 3 months stabilizing the release branch. So for example, we bump the version to 16.0.0 in July 2022, and that version will get released in March 2023. However, during those 9 months, we may want to make more than one breaking change to the format.

This may not be an issue for projects that only follow stable releases, but there's plenty of major projects that live at HEAD, such as Android, Chromium or Fuchsia, for which this could cause an issue. Using a dedicated versioning scheme allows us to decouple the format evolution from Clang releases.


Use llvm::VersionTuple rather than std::string which is intended for this purpose.

michaelplatings marked 2 inline comments as done.

Decouple multilib versioning scheme from the Clang version

phosek added inline comments.Tue, May 23, 1:34 AM

Rather than parseYaml modifying the state of MultilibSet, can we instead have a static factory method or a standalone function that parses, constructs and returns llvm::Expected<MultilibSet>. That's more in line with how we handle YAML deserialization elsewhere in LLVM, and also follows the notion of MultilibSet being immutable.


This is just a nit, but the error messages in this file are inconsistent. Sometimes they are full sentences (capitalized, ending with a dot), sometimes they are not (not-capitalized, no dot at the end) and in this particular case it's a mixture (not-capitalized, ending with a dot, although it seems like dot here is only used as a separator). I don't have a strong preference for one style over the other, but I think we should pick one and be consistent.