Page MenuHomePhabricator

[SVE] Auto-generate builtins and header for svld1.

Authored by sdesmalen on Mar 2 2020, 9:25 AM.



This is a first patch in a series for the SveEmitter to generate the arm_sve.h
header file and builtins.

I've tried my best to strip down this patch as best as I could, but there
are still a few changes that are not necessarily exercised by the load intrinsics
in this patch, mostly around the SVEType class which has some common logic to
represent types from a type and prototype string. I thought it didn't make
much sense to remove that from this patch and split it up.

Diff Detail

Event Timeline

sdesmalen created this revision.Mar 2 2020, 9:25 AM
Herald added a project: Restricted Project. · View Herald Transcript
sdesmalen marked an inline comment as done.
sdesmalen added inline comments.

I just see that this comment will need updating, as that line seems copied from SveEmitter.cpp.

Big patch, I only did a first scan, but looks very reasonable in general. Just a first round of nits and 2 questions.


This encoding, e.g, this is "csilUcUsUiUlhfd", is such a monstrosity. It's a very efficient encoding, but of course completely unreadable. I know there is prior art, and know that this is how it's been done, but just curious if you have given it thoughts how to do this in a normal way, a bit more c++y. I don't want to de-rail this work, but if we are adding a new emitter, perhaps now is the time to give it a thought, so was just curious.


I wanted to add the nit that SveEmiiter.cpp should perhaps be SVEEmitter.cpp, but then I saw at the bottom that MVE is spelled Mve, so perhaps this is fine then.


why a default of 128? Will this gives problems for SVE implementions with> 128 bits?


nit: for readability, perhaps don't abbreviate some of these member names?

R -> Record
BaseTS -> BaseTypeSpec
CK -> ClassKind


don't need this


just a return here

sdesmalen updated this revision to Diff 248432.Mar 5 2020, 4:12 AM
sdesmalen marked 11 inline comments as done.
  • Renamed CK and BaseTS
  • Refactored switch statementsd in SVEType::getTypeFlags()

Haha, its a bit of a monstrosity indeed. The only thing I can think of here would be having something like:

class TypeSpecs<list<string> val> {
  list<string> v = val;
def All_Int_Float_Ty : TypeSpecs<["c", "s", "i", "l", "Uc", "Us", "Ul", "h", "f", "d">;

def SVLD1 : Minst<"svld1[_{2}]", "dPc", All_Int_Float_Ty, [IsLoad]>;

But I suspect this gets a bit awkward because of the many permutations, I count more than 40. Not sure if that would really improve the readability.


SVE vectors are n x 128bits, so the 128 is scalable here.


Record and ClassKind are also the names of the enum though.
Perhaps I can rename CK to Class?


Good catch!

SjoerdMeijer added inline comments.Mar 5 2020, 5:52 AM

I would personally welcome any improvement here, even the smallest. But if you think it's tricky, then fair enough!

I've managed to completely ignore the MVE intrinsics work so far, but understood there were some innovations here and there (e.g. in tablegen). Probably because it is dealing with similar problems: a lot of intrinsics, some of them overloaded with different types. I'm going to have a little look now to see if there's anything we can borrow from that, or if that is unrelated....


ah, okay, fair enough, didn't realise that.

Adding @simon_tatham in case he feels wants to have a look too.

simon_tatham added inline comments.Mar 5 2020, 7:45 AM

In the MVE intrinsics implementation I completely avoided that entire system of string-based type specifications. There's another completely different way you can set up the types of builtins, and I used that instead.

You can declare the function for Builtins.def purposes with no type specification at all, and then you fill in its type signature using a declaration in the header file, with the unusual combination of __inline__ and no function body:

static __inline__ int32x4_t __builtin_arm_foo_bar(int16x8_t, float23x7t); // or whatever

In fact I went one step further: the user-facing names for the MVE intrinsics are declared in arm_mve.h with a special attribute indicating that they're aliases for clang builtins. And the MVE polymorphic intrinsics are done by adding __attribute__((overloadable)) to the declaration, which allows C++-style overloading based on parameter types even when compiling in C. So when the user invokes an MVE intrinsic by its polymorphic name, the compiler first does overload resolution to decide which declaration in the header file to select; then it looks at the builtin-alias attribute and discovers which internal builtin id it corresponds to; and then it can do codegen for that builtin directly, without a wrapper function in the header.

Pros of doing it this way:

  • if the builtin requires some of its arguments to be compile-time constants, then you don't run into the problem that a wrapper function in the header fails to pass through the constantness. (In NEON this is worked around by making some wrapper functions be wrapper macros instead – but NEON doesn't have to deal with polymorphism.)
  • declaring a builtin's type signature in the header file means that it can include definitions that the header file has created beforehand. For example, one of the arguments to the MVE vld2q family involves a small struct containing 2 or 4 vectors, and it would be tricky to get that struct type into the Builtins.def type specification before the header file can tell clang it exists.
  • doing polymorphism like this, rather than making the polymorphic function be a macro expanding to something involving C11 _Generic, means the error messages are orders of magnitude more friendly when the user messes up a call. (Also it's remarkably fiddly to use _Generic in complicated cases, because of the requirement that even its untaken branches not provoke any type-checking or semantic errors.)
  • I don't know of any way that the preprocessor + _Generic approach can avoid expanding its macro arguments at least twice. It can avoid evaluating twice, so that's safe in the side-effect sense, but you still have the problem that you get exponential inflation of the size of preprocessed output if calls to these macros are lexically nested too deeply.


  • you have to do all of your codegen inside the clang binary, starting from the function operands you're given, and ending up with LLVM IR. You don't get to do the tedious parts (like unpacking structs, or dereferencing pointer arguments for passed-by-reference parameters) in the wrapper function in the header, because there isn't one. I had to invent a whole system in MveEmitter to allow the IR generation to be specified in a not-too-verbose way.
  • if the builtins don't have type declarations until the header is included, then users can't call them without the header file. Probably this is fine for SVE intrinsics the same way it is for MVE, where the builtins are a detail of that particular compiler's implementation and users are intended to use the compiler-independent public API. But in cases where the builtin itself was intended to be called directly by the end user (in the way that __builtin_clz is, for example), you'd probably want it to work everywhere.
  • if you do polymorphism using __attribute__((overloadable)) then all the things you're overloading between have to be real functions. You can't make some of them be macros, with the extra flexibility a macro gives you. (But then, making them builtins rather than genuine functions restores some of that flexibility.)

Off the top of my head I don't know whether all these ideas can be separated from each other. It feels to me as if all the choices I made are leaning on each other and making a mutually supporting whole, and it's quite possible that if you tried to cherry-pick just one of these design decisions into an otherwise more conventional approach, it might all come crashing down. But I haven't tried it :-)

sdesmalen marked an inline comment as done.Mar 6 2020, 10:14 AM

Adding @simon_tatham in case he feels wants to have a look too.

Thanks Sjoerd! @simon_tatham and I had a chat about this offline today.

The SVE implementation now does more or less the same thing the MVE implementation; arm_sve.h also uses __attribute__((overloadable)) and __attribute__((arm_sve_alias("__builtin_..."))), the latter only to declare the overloaded intrinsics. That means we get the same benefits as Simon described.

There are a few details that are different:

  • The MVE implementation *does not* use the type string in the actual Builtins.def. For example, the type string below is not "V16ScV16ScV16Sci", but rather "":

TARGET_HEADER_BUILTIN(__builtin_arm_mve_vcaddq_rot270_f16, "", "n", "arm_mve.h", ALL_LANGUAGES, "")
  • This means that the implementation relies solely on the declaration in arm_mve.h to define the intrinsic prototype.
  • If I understood this correctly, this is currently needed to represent the tuple types (which cannot yet be expressed in the bulitin type string format) returned from structured loads like ld2/ld3/ld4.

The SVE implementation *does* use the type string in Builtins.def.

  • For SVE, the structured loads will just return a wider vector with 2, 3 or 4 times the number of elements, so the lack of support in the builtin type-string format is not an issue.
  • An advantage of that is that we can use a #define svadd_u8(...) __builtin_svadd_u8(...) for the non-overloaded builtins which helps compile-time performance.

If all the intrinsics can be described *with* a builtin type string, we can actually work to define all builtins internally in Clang (as @efriedma suggested in D75298) rather than having to rely on the header file to define the prototypes, hopefully get rid of the expensive header file.
When we do this for SVE, MVE can probably follow the same approach.


Thanks for sharing some background here @simon_tatham!

miyuki added a subscriber: miyuki.EditedMar 9 2020, 9:59 AM

The SVE implementation now does more or less the same thing the MVE implementation; arm_sve.h also uses __attribute__((overloadable)) and __attribute__((arm_sve_alias("__builtin_..."))), the latter only to declare the overloaded intrinsics. That means we get the same benefits as Simon described.

FYI: we are currently working on another M-profile extension, CDE. In my patches am reusing some of the MVE intrinsic machinery, and I decided to rename __clang_arm_mve_alias to __clang_arm_builtin_alias. You might want to reuse the same attribute name to reduce duplication, see

SjoerdMeijer added inline comments.Mar 9 2020, 2:32 PM

I am wondering if it is confusing/correct to use NeonInstrinsicInfo here?


and the same here: findNeon...

sdesmalen updated this revision to Diff 249312.Mar 10 2020, 4:32 AM


sdesmalen marked an inline comment as done.Mar 10 2020, 4:34 AM
sdesmalen added inline comments.

We can reuse the same info-struct and find function, but the names are indeed misleading. I've renamed these.

SjoerdMeijer accepted this revision.Mar 10 2020, 6:20 AM

Cheers, I think this looks very reasonable.

This revision is now accepted and ready to land.Mar 10 2020, 6:20 AM
This revision was automatically updated to reflect the committed changes.
thakis added inline comments.

Update comment to also say "SVE" and "CDE" (or just say "# ARM builtin headers")


Any reason these aren't called -gen-arm-sve-builtin-def and -gen-arm-sve-builtin-codegen for consistency with CDE and MVE?

thakis added inline comments.Mar 16 2020, 6:52 AM

Including stuff from clang/Basic in clang/utils/TableGen is conceptually a layering violation: clang-tblgen is used to generate headers included in clang/Basic. In this case it happens to work, but it's because you're lucky, and it could break for subtle reasons if the TypeFlags header starts including some other header in Basic that happens to include something generated.

Please restructure this so that the TableGen code doesn't need an include from Basic.

sdesmalen marked 2 inline comments as done.Mar 16 2020, 7:40 AM
sdesmalen added inline comments.

Thanks for pointing out! The only directory that seems to have common includes between Clang TableGen/CodeGen is the llvm/Support directory, any objection to me moving the header there?


Not really, I can change that.

This patch broke the Clang build with enabled modules (which is used by the LLDB bot, but every other bot is also dead: )

thakis added inline comments.Mar 16 2020, 9:04 AM

That seems like a strange place for this header.

Maybe you can rework things so that you don't have to share a header between clang's tablegen and clang's Basic? No other tablegen output so far has needed that. (see e.g. the /// These must be kept in sync with the flags in utils/TableGen/NeonEmitter.h. line in TargetBuiltins.h).

If that isn't possible at all, I suppose you could put the .h file in clang/utils/TableGen and also make clang-tblgen write the .h file and use the written .h file in Basic.

thakis added inline comments.Mar 18 2020, 8:35 AM

Looks like you renamed the files to be consistent (thanks!), but not the flag names. Can you make those consistent too?