Page MenuHomePhabricator

[RFC] Create an Arbitrary Precision Integer Type.
Needs ReviewPublic

Authored by erichkeane on Mar 7 2019, 12:28 PM.

Details

Summary

Introduction
As we all know, LLVM-IR supports arbitrary precision integers via the iN
syntax. Integers that aren’t powers-of-two aren’t particularly
interesting/useful on most hardware however, so this functionality isn’t
typically used, and in fact, Clang does not have such support. However,
certain architectures DO exist where the ability to express these values
(and do math without promotion to full integers) is valuable from a
performance perspective, since these values can be expressed more
efficiently in hardware.

I’ve developed a solution to expose these types in clang, and am
proposing to contribute it back to the Clang community if it is
sufficiently acceptable to the group.

Syntax
The syntax I’ve chosen for this is as a Typedef Attribute. This permits
the consumer to define various ways to expose this functionality in
their code. Additionally, we’ve limited it to int/unsigned typedefs
only for simplicity’s sake. The code looks something like:

// Typical way to expose this in C:
typedef int attribute((ap_int(3))) ap_int3;
typedef unsigned
attribute((ap_int(3))) ap_uint3;

// Better way to expose it in modern C++:
template<unsigned bits> using ap_int = int
attribute((ap_int(bits)));
template<unsigned bits> using ap_uint = unsigned
attribute((ap_int(bits)));

For our usages, we just wrapped these in a type that would better
express the further configurable semantics, but I suspect these are
useful in their own right.

Conversions/Promotions
We consider conversions to/from integers integral promotions that follow
normal conversion rules with the exception of automatic promotion to
int, and bool conversions are also supported (just like integers).
AP-Int math is done at the size of the largest operand in order to
prevent promotions that would inflate the size required hardware (such
as in an FPGA).

Size/Align
One important consideration that was made is how size/alignment is
expressed in the language. Sizeof, for example works exclusively on
bytes, so an ap_int<13> couldn’t be directly expressed. Alignments are
required to be a power-of-two, otherwise they are not expressable on
some hardware. In implementation, we chose to use the next largest
alignment (up to 64 bits). Additionally, for the sake of the standard
library (and common array patterns), we are required to make sizeof
also round up to the next power of two. This is most consistent with
what a majority of LLVM backends will do (for example, an ap_int<24>
will be expressed as a int-32 on most platforms), and was required to
make common array patterns work. Unfortunately, this results in arrays
on these platforms having ‘padding’ bits, but sufficiently motivated
code generation can repair this problem.

Diff Detail

Event Timeline

erichkeane created this revision.Mar 7 2019, 12:28 PM
rsmith added a subscriber: rsmith.Mar 7 2019, 2:03 PM

In principle, I think an extension in this space seems reasonable and useful. Considering the points in http://clang.llvm.org/get_involved.html, I'm happy to assume you meet points 2, 5, and 6, but I'd like to see a written-up rationale for the other points. (In particular, you do not yet meet the bar for points 3 and 7, and if you're not proposing this to WG14 or WG21 or similar, some amount of rationale justifying that is necessary.)

I don't like the name ArbPrecIntType; this is not an arbitrary-precision integer type, it's a fixed-width integer type. Something like ExtIntegerType might fit better.

Defining new types via attributed typedefs is a horrible hack, and I would not like to extend its scope any further. Have you considered alternative interfaces for specifying the type? For instance, we could support an __int(N) type-specifier. (I would expect such a specifier to behave like a normal integer width type-specifier (eg, like short or long or __int128), so we would allow things like unsigned __int(54).)

Have you considered whether __int(128) (however you spell it) should be the same type as the existing __int128? (I think it makes sense for the __int(N) types to be distinct from all the standard types, even if N matches, unlike the behavior of the GNU mode attribute. But the relationship to __int128 is less clear.)

How do integral promotions behave on these types? Are these types intended to follow the rules for extended integer types? If not, in what ways, and why not?

I'd like to see some tests for use of these types in these contexts:

  • bit-fields
  • enum-bases
  • non-type template arguments (incl. mangling thereof)
  • UBSan, and in particular -fsanitize=signed-integer-overflow (this will break; UBSan's static data representation assumes that integer types have power-of-two sizes)
  • overload resolution with these types as parameters / arguments, particularly involving overload sets also including standard integer types
  • TBAA metadata (I assume there is no aliasing relationship between these and standard integer types)
include/clang/Basic/AttrDocs.td
4123–4126

Please frame the documentation around the feature itself rather than your motivating use case. Mentioning FPGAs as an aside might make some sense, but it shouldn't be the primary introduction to the extension's documentation.

4139–4141

Please mention that you convert to unsigned if the sizes match, like standard integer types do.

Please also describe the promotion rules for these types.

lib/AST/ASTContext.cpp
1687–1693

This should not be necessary. getTypeInfo should not be returning sizes that are not a multiple of the bit-width of char.

1788

Isn't this wrong for arrays of __attribute__((aligned(N))) types? Eg: https://godbolt.org/z/jOW_B5

(GCC changed this and broke ABI in version 5 onwards, but if/when we follow suit it should be an intentional decision, likely with a -fclang-abi-compat= flag, not a side-effect of an unrelated change such as this one.)

2132

Please do not add another case where a type's size is not a multiple of its alignment. This behavior of __attribute__((aligned(N))) is a mistake that we should not repeat. (Remember that sizeof(T) is supposed to be the size of T as an array element.) Width should include the padding necessary to reach a multiple of the alignment.

2133–2134

The use of 64 as the largest alignment here is arbitrary. Using the alignment of long long would make more sense, but that choice should be made by a method on Target.

5710–5711

This looks wrong: you're ranking __int(2) above long, which violates the integer conversion rank rules for extended integer types. You also give __int128 and __int(128) the same rank despite being different types, which also violates the rank rules. (But I'd suggest fixing that by making them the same type.)

lib/AST/ItaniumMangle.cpp
3377

Please use a reserved vendor mangling.

If you take my suggestion for the source-level interface, something like U5__intILi120Ei might make sense. With your current approach it should be Ua8__ap_intILi120Ei.

lib/Sema/SemaExpr.cpp
9030–9046

This is moderately terrible. I mean, no worse than the existing language rules, but still, moderately terrible. =(

9153–9156

Should this be part of UsualArithmeticConversions instead of being duplicated everywhere?

lib/Sema/SemaOverload.cpp
7399

"vector"

lib/Sema/SemaType.cpp
2074–2076

Use VerifyIntegerConstantExpression here. It'll do contextual implicit conversion to an integer type for you (in C++11 onwards), and properly diagnose why the expression is non-constant if necessary.

2100

We should enforce an upper bound on the size of such a type. (Eg, we shouldn't permit an integer type that doesn't fit in the address space, and we may want a lower limit than that for sanity.)

test/Sema/arb_prec_int.c
18

Why? A signed bitfield can have a width of 1 (representing 0 and -1).

erichkeane marked 7 inline comments as done.Mar 8 2019, 6:40 AM

Thank you @rsmith for the quick review! I really appreciate it.

In principle, I think an extension in this space seems reasonable and useful. Considering the points in http://clang.llvm.org/get_involved.html, I'm happy to assume you meet points 2, 5, and 6, but I'd like to see a written-up rationale for the other points. (In particular, you do not yet meet the bar for points 3 and 7, and if you're not proposing this to WG14 or WG21 or similar, some amount of rationale justifying that is necessary.)

I can definitely work on the test suite, and I'll attempt to improve the specification. At the moment I don't believe we have an effort to get this into WG14/WG21, however I'll see if it is possible to figure out our interest here.

I don't like the name ArbPrecIntType; this is not an arbitrary-precision integer type, it's a fixed-width integer type. Something like ExtIntegerType might fit better.

Defining new types via attributed typedefs is a horrible hack, and I would not like to extend its scope any further. Have you considered alternative interfaces for specifying the type? For instance, we could support an __int(N) type-specifier. (I would expect such a specifier to behave like a normal integer width type-specifier (eg, like short or long or __int128), so we would allow things like unsigned __int(54).)

I'd considered a type specifier, though I'd convinced myself that doing this more akin to the vector types would be more acceptable.

Have you considered whether __int(128) (however you spell it) should be the same type as the existing __int128? (I think it makes sense for the __int(N) types to be distinct from all the standard types, even if N matches, unlike the behavior of the GNU mode attribute. But the relationship to __int128 is less clear.)

In our opinion, int(64) should be different than long/long long/etc, and int(16) shouldn't be short. That is because these types are useful when they don't follow common promotion rules, which force it to promote at in-opportune times (though, moreso with short, the rest were due to creating new rules anyway). Additionally, the ability to overload on these types has been useful to our customers. I'm not sure I understand enough of __int128 to determine whether there is sufficient motivation to make these separate types.

How do integral promotions behave on these types? Are these types intended to follow the rules for extended integer types? If not, in what ways, and why not?

This question feels like a trap :) While I wouldn't dare to claim understanding of the rules, I believe the only difference is exclusion from integral promotion rules.

I'd like to see some tests for use of these types in these contexts:

  • bit-fields
  • enum-bases
  • non-type template arguments (incl. mangling thereof)
  • UBSan, and in particular -fsanitize=signed-integer-overflow (this will break; UBSan's static data representation assumes that integer types have power-of-two sizes)
  • overload resolution with these types as parameters / arguments, particularly involving overload sets also including standard integer types
  • TBAA metadata (I assume there is no aliasing relationship between these and standard integer types)
lib/AST/ASTContext.cpp
1788

Oh dear... I was unaware of aligned. Thats... special.

2132

I think you're right here of course. I believe you've likely noticed I end up doing an 'align-to' all over the place. I think I should have just done it here.

2133–2134

Makes sense.

5710–5711

Hmm... this actually ends up being a pretty interesting thing to try to work through. There is probably some pretty interesting work I'll have to do to make sure they better fit in this list.

lib/Sema/SemaExpr.cpp
9030–9046

I'd hoped to merge this with the usual conversion functions, however they were quite insistent on converting things to "int" when smaller than that, which absolutely trashes FPGA performance.

9153–9156

I ended up pushing this down the stack at one point, and had problems with the all-math-at-int rules, but perhaps this could be put at the beginning of UsualArithmeticConversions.

test/Sema/arb_prec_int.c
18

Well, huh, TIL. That is somewhere between interesting and horrifying.

ebevhan added inline comments.
lib/AST/ASTContext.cpp
1687–1693

Is this really an invariant of getTypeInfo? We have struck upon this issue downstream as well, as we have types which are not a multiple of the char size. The assumption in many places seems to be that when faced with scalar types, getTypeInfo does return the true bit width, not the width in chars*char width. getTypeInfoInChars obviously cannot handle this case, since it uses getTypeInfo as a basis for its calculation.

This is an issue for essentially any non-record, non-array type (such as typedefs or elaborated types) which consists of non-char-width types. Direct array types only work as there is a fast path (getConstantArrayInfoInChars) which does the per-element calculation correctly, and record types work because that calculation is delegated to the record layout builder.

mgehre added a subscriber: mgehre.Oct 11 2019, 4:05 AM