This is an archive of the discontinued LLVM Phabricator instance.

RFC: Multilib prototype
AbandonedPublic

Authored by michaelplatings on Jan 4 2023, 1:27 AM.

Details

Reviewers
None
Summary

This is a prototype of a new approach to multilib.

The multilib.yaml file is a new interface that allows customising (a) which system library variants are available; and (b) how a library variant is identified as compatible with the clang arguments provided by the user.

The file clang/test/Driver/Inputs/baremetal_multilib/arm-none-eabi/multilib.yaml is an example that contains comments explaining how it is used.

The project I'm working on is LLVM Embedded Toolchain for Arm so the design is made with that in mind, but it's intended to be flexible enough to work with other toolchains with many system library variants.

RFC thread is here: https://discourse.llvm.org/t/rfc-multilib/67494

Diff Detail

Event Timeline

michaelplatings created this revision.Jan 4 2023, 1:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 1:27 AM
michaelplatings requested review of this revision.Jan 4 2023, 1:27 AM
michaelplatings edited the summary of this revision. (Show Details)Jan 4 2023, 3:19 AM
phosek added a subscriber: phosek.Jan 4 2023, 9:40 AM
phosek added inline comments.
clang/include/clang/Driver/Multilib2.h
33–34

I think it'd really help readability and comprehension if this was modeled using types (that is structs and classes). It's not at all clear what these strings are supposed to represent, and seeing expressions like std::get<0> doesn't help either.

clang/test/Driver/Inputs/baremetal_multilib/arm-none-eabi/multilib.yaml
24

I understand how the second section is used to match arguments passed to clang -cc1 and turn those into attributes which are then used to find the right variant, but I don't understand what this list of arguments is used for?

michaelplatings added inline comments.Jan 5 2023, 6:55 AM
clang/include/clang/Driver/Multilib2.h
33–34

I fully agree. Sorry if I wasn't clear in the RFC but this is the kind of thing I meant when I said the code is not intended to be production quality.

clang/test/Driver/Inputs/baremetal_multilib/arm-none-eabi/multilib.yaml
24

Having these here permits clang -print-multi-lib to work, which would print out something like this:

thumb/nofp;@mthumb@mfloat-abi=soft
thumb/v7/nofp;@mthumb@march=armv7@mfloat-abi=soft
...

Did you look at Yaml I/O?

"Make difficult things possible": perhaps it might be useful to make sure it's at least possible to express a complex boolean function of the basic predicates, even if it's cumbersome? (So that, for example, you could make an attribute conditional on "this cc1 option but not that one".) I assume YAML would let you write something like an expression AST in hierarchical form, with predicates like regex: at the leaves, and internal nodes for AND, OR and NOT.

Then again, perhaps it's enough to just make sure there's extension room in the syntax so that this can be added later, and there's no need to actually implement it in the first version.

clang/test/Driver/Inputs/baremetal_multilib/arm-none-eabi/multilib.yaml
75

Nit: don't forget to document whether these use basic or extended regex syntax

bsmith added a subscriber: bsmith.Jan 20 2023, 3:00 AM

A constraint is that the user should be able to choose the library with a single list of arguments, as displayed with -print-multi-lib. Therefore I'm sceptical about making the selection mechanism too expressive, because -print-multi-lib can't explain that to the user. Already the presence of negation is stretching that.

Nevertheless let's pick an example of something that can't be directly expressed in the prototype: NOT (a AND b). (I don't know why you'd want to do that, if you have a motivating example of something genuinely useful then that would be better)

The expression can be rearranged to (a AND NOT b) OR (b AND NOT a) OR ((NOT a) AND NOT b) which can be represented in an ugly way:

variants:
- path: p
  args: [-a]
  attrs: [a, not-b]
- path: p
  args: [-b]
  attrs: [not-a, b]
- path: p
  args: []
  attrs: [not-a, not-b]
arguments:
- regex: -a
  matchAttrs: [a]
  noMatchAttrs: [not-a]
- regex: -b
  matchAttrs: [b]
  noMatchAttrs: [not-b]

I think this fits with the spirit of "Make the easy things easy, and the hard things possible."

perhaps it's enough to just make sure there's extension room in the syntax so that this can be added later

One of the many nice things about JSON & YAML is it gives you loads of room for extension, so yes. For example, currently attrs must be an array, but the option could be added later that it could be an object like attrs: {not: [a, b]}. But I hope it doesn't prove necessary ;)

lenary added a subscriber: lenary.Jan 23 2023, 4:00 AM

Thanks for putting together this proposal.

Broadly, I think it's good, and I think it solves a lot of rough edges that I recall running into with the GCC multilib configuration files in the past.

I do worry about using -cc1 args. On the one hand, it's the best place for getting access to information about e.g. exceptions, floating point abi, and other concepts (optimisation level?), but on the other hand, it is a reasonably unstable interface, and it's parsed in a complex and stateful way -- later options can override earlier ones -- which makes it difficult to parse with regular expressions.

To be clear: I don't think there's a better way of handling exceptions/no-exceptions library selection (than parsing the -cc1 arguments) or the floating point abi library selection, so having a way to examining the -cc1 arguments is a necessary "lowest common denominator" interface (I'll come back to this later), but I think there are places we can do better.

In terms of finding a more stable interface, the Arm ACLE recently added Beta support for Function Multi-Versioning which is proposing a lot of stable identifiers for architectural features in v8-a and v8-r onwards. These names are expected to be written by programmers in their source code, so are a stable interface in a way that -target-feature +<name> is absolutely not. These are resolved by clang in order to handle duplicating the same function to compile it for multiple targets, which to my mind is the exact same problem that multi-libs are trying to solve. We have specified stable names for a lot of features in the ACLE (and presumably will continue to do so), as well as a priority system for resolving how to choose which function should be used at runtime. I concede that these stable names do not cover M-profile or pre-v8 architectures, which are important to embedded toolchains.

The information about these stable feature names is known about by AArch64TargetInfo right now, and we are looking at where exactly this information should be best understood by LLVM in general (maybe the TargetParser library). Given where it is in clang's code, I think it should be usable by both the driver and internally inside -cc1, but I'm not 100% sure the C++ interface is perfect for both. But, we can always revisit the C++ interface to make it work for what we want.

So, to concretely turn this into a proposal, I think there are two things you could do:

  • in the arguments: part of the yaml structure, you currently have regex: which is for matching on the -cc1 arguments, we could have a archFeatures: for matching on a list/set of named features (instead of the regex:), to turn them into multilib library attributes.
  • treat the stable architectural feature names like extra implicit attributes that don't show up in the arguments: section, but can be used in the variants: section like the -cc1-derived attributes. Here, we would need to be careful the architectural feature names didn't clash with the user-provided names from the arguments: section, and we might need to have a way to specify the absence of architectural features as well. I'm not 100% convinced that the implicit-ness of this option is good, but I think the stability wins are useful.

To cover some drawbacks of the approach of using these stable names:

  • Not all targets support function multi-versioning. In this case, they are welcome to fall back to using -cc1 arguments, as a lowest common denominator approach.
  • Not all multilibs are chosen based on architectural features: again, you can fall-back to -cc1 arguments, especially as I think e.g. -fno-exceptions is more stable than -target-feature +<name>.
  • Even for Arm/AArch64, there are pre-v8-a/v8-r architectures which don't have stable names in the ACLE yet. Again, in the short term we could use -cc1 arguments, before we stabilise architectural names.

How the multilib system matches user arguments is the main problem I've been struggling with.

These are the options I see:

  1. Grant the multilib system full access to all cc1 arguments, as demonstrated by the current code. However declare the multilib system unstable - if you write a multilib specification, you are responsible for making sure it's updated when clang is updated. Clang plugins set the precedent for such an approach.

    Advantage: It's powerful.

    Disadvantages:
    • Experience with clang plugins suggests instability will be a source of frequent pain.
    • @lenary points out that the order of arguments can be significant.
    • Requires some major refactoring of how arguments are processed before this can be done in a robust way (how I implemented it in the prototype is far from robust).
  1. Pass to the multilib system only the arguments as provided by the user.

    Advantage: Greatly reduced API surface.

    Disadvantage: whoever writes a multilib specification will have to reinvent a lot of logic to infer attributes from --target, -march, -mcpu etc. but with inadequate tools to do so.
  1. Normalise user arguments before passing them to the multilib system.

    Advantages:
    • Uses the existing arguments API.
    • Powerful, if it can be done correctly.

      Disadvantage:
    • The normalisation process effectively becomes a new API.
    • Normalisation appears hard to do correctly = likelihood of bugs.
    • Identifying attributes from arguments is fiddly e.g. extracting Arm architecture extensions from the march argument.
    • It's too easy to write a regex that could unintentionally match a newly introduced argument so the instability problem remains somewhat.
  1. Hard-code the attributes and how they are detected in the same way they already are right now. For Arm this could be extended with Function Multi Versioning as @lenary pointed to, with namespacing to avoid potential clashes with other names. So you could have attributes like [fsanitize=address, fno-exceptions, mfloat-abi=softfp, march=thumbv8.1m.main, arm:mve].

    Advantages:
    • This is similar to what we already do for multilib so it's well understood.
    • Limited yet extensible API.

      Disadvantages:
    • Greatly limits the expressiveness of the multilib system.
    • Any time someone wants to base their multilib on a new attribute it requires a change to Clang.
    • Some regular expressions still required for forward compatibility e.g. with future versions of Armv8-M.
  1. A combination of (1) and (4) as proposed by @lenary.

    Advantages:
    • Stability where predefined attributes are available.
    • The power to access unstable arguments if necessary.

      Disadvantages:
    • Same as for (1) - requires some major refactoring of how arguments are processed before this can be done in a robust way (how I implemented it in the prototype is far from robust).
    • Greater maintenance burden.

None of these options are great but (4) is low risk and extensible so I'm currently investigating that.

Joe added a subscriber: Joe.Feb 1 2023, 1:47 AM

FWIW I tried this patch out with RISC-V multilibs, and it works a treat. It solves the multilib reuse problem and relaxes the order of non-standard extensions. I did have to copy some of the logic from BareMetal.cpp to Gnu.cpp as was using a GCC installation.

The args field for the YAML was unnecessary and worked (including --print-multi-lib) just fine without it.

michaelplatings abandoned this revision.Feb 1 2023, 2:10 AM

I've created a new stack of changes taking into account the feedback. Unlike this change which was strictly a prototype, the new changes should be suitable for detailed review and hopefully approval.

I'm working on further patches to enable multilib layering, but that needn't block the above changes.

Thanks for giving it a try @Joe. The new changes are by design much more limited, in order to avoid creating an unstable API. I'd be interested to learn from you whether it's suitable for what you need, possibly by adding more functionality. In particular you'll see that https://reviews.llvm.org/D142933 has functionality specific to Arm extensions so something similar may be required for other architectures.

Did you look at Yaml I/O?

Thanks for the tip, I've used it in https://reviews.llvm.org/D142932 and it's a great improvement.