Page MenuHomePhabricator

[builtins] Support architectures with 16-bit int
ClosedPublic

Authored by aykevl on Apr 22 2020, 12:11 PM.

Details

Summary

This is the first patch in a series to add support for the AVR target. This patch includes changes to make compiler-rt more target independent by not relying on the width of an int or long.


Some notes:

  • I changed the integer types used in many builtins (si_int, du_int, etc) to specific integer widths. My rationale is that even though the libgcc documentation uses types such as int / long / long long, I believe their bit width is actually much more strictly defined. For example: LLVM has a file RuntimeLibcalls.def where it uses very specific bit widths for all these builtins.
  • I changed some parameters and return types from si_int to just int when they are used for a number of bits (such as in shift instructions). Affected functions: __ashldi3, __ashrdi3, __ctzdi2, __ffsdi2, __ffssi2, __lshrdi3, __popcountdi2.
  • I changed some bare int/unsigned int types to a more specific si_int/su_int. I think this has always been an issue but nobody noticed because they used to be the same. Affected functions: __floatsidf, __floatunsidf.
  • I changed a few zero-counting builtin calls (__builtin_clz, __builtin_ctzl, etc) to use a bigger integer width. This seems to work, but perhaps it's better to define macros for these in int_lib.h: this change may have performance implications.

For context: I have a branch here where I collect all the patches needed to add support for AVR in the compiler-rt builtins. Some commits are already upstream, some are still in review, and some need some cleanup before I can post them on Phabricator.

Diff Detail

Event Timeline

aykevl created this revision.Apr 22 2020, 12:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2020, 12:11 PM
Herald added subscribers: Restricted Project, dylanmckay. · View Herald Transcript
aykevl edited the summary of this revision. (Show Details)Apr 22 2020, 12:21 PM

Is making all shift amounts "int" consistent with what LLVM does?

Maybe it would make sense to stick macros in int_types.h to call the right __builtin_clz/__builtin_ctz variant for si_int?

compiler-rt/lib/builtins/floatdisf.c
29

si_int?

compiler-rt/lib/builtins/floatundisf.c
27

si_int?

compiler-rt/lib/builtins/fp_extend.h
24

Maybe #if UINT_MAX < 0xFFFFFFFF, since that's what we actually care about.

aykevl added a comment.EditedApr 23 2020, 5:06 AM

Is making all shift amounts "int" consistent with what LLVM does?

So I've been looking into it a bit and found that LLVM appears to truncate these numbers to i8 before calling the function. Which makes sense, because with that you can still shift large numbers with just an i8. I think it doesn't matter which integer width it is in most calling conventions, especially as it is the last parameter to be passed.
Source code for test: https://llvm.godbolt.org/z/xDaNQk
As you can see, only shifts of type i16 or i8 lack mov instructions to get the shift amount in the right place. And you can see that these are mov instructions (for 8-bit moves), not movw instructions (for 16-bit moves).
EDIT: also note that the AVR calling convention aligns values passed in registers to two bytes, so it will use a register pair even for an i8. That's why an i16 works.

I have also compiled LLVM with the following patch:

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 453500aa9e5..2ff67feb99d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -3528,6 +3528,8 @@ void DAGTypeLegalizer::ExpandIntRes_Shift(SDNode *N,
   }
 
   if (LC != RTLIB::UNKNOWN_LIBCALL && TLI.getLibcallName(LC)) {
+    printf("shift libcall operand 1:\n");
+    N->getOperand(1).dump();
     SDValue Ops[2] = { N->getOperand(0), N->getOperand(1) };
     TargetLowering::MakeLibCallOptions CallOptions;
     CallOptions.setSExt(isSigned);

For the above example I get the following output on AVR:

shift libcall operand 1:
t23: i8 = truncate t6
shift libcall operand 1:
t9: i8 = truncate t6
shift libcall operand 1:
t6: i8,ch = CopyFromReg t0, Register:i8 %2
shift libcall operand 1:
t45: i8 = truncate t10
shift libcall operand 1:
t38: i8 = truncate t10
shift libcall operand 1:
t15: i8 = truncate t10
shift libcall operand 1:
t10: i8,ch = CopyFromReg t0, Register:i8 %4

Notice that all shift widths are of type i8.
Unfortunately I don't have any other architectures compiled in that need such shifts (such as msp430).

In summary: I think using int is fine as LLVM internally passes an i8. It might even make sense to use unsigned char (to more closely match LLVM), but I don't think it's needed for the currently supported architecutres.

aykevl updated this revision to Diff 259548.Apr 23 2020, 6:45 AM
  • defined two macros clzsi and ctzsi for counting bits in si_int and su_int
  • changed int32_t to si_int in two places

I believe this addresses all review comments.
I'm not entirely sure about the clzsi/ctzsi macros. The name seems clear enough, but perhaps they are too close to __clzsi2/__ctzsi2.

In summary: I think using int is fine as LLVM internally passes an i8

That doesn't seem quite right. Consider, for example, long long a(long long a, int b) { return a << (char)(b); }. To generate correct code, you need to clear the high bits of b. If we just model the argument as an anyext i8, we'll skip clearing those bits, and the compiler-rt C implementations will have undefined behavior. It looks like LLVM does actually extend the value to "int", though. (It doesn't choose between sign/zero extend consistently, but it doesn't actually matter here).

I tried to gather some other relevant evidence from building some examples with gcc. Apparently popcount and count leading/trailing zeros do actually return "int", always. This change is consistent with that; we probably should fix LLVM. (LLVM doesn't usually use the libcalls for those, but it can in some cases, I think.)

As for this patch itself, LGTM, but I'd like a second set of eyes on it.

This passed our internal tests, so I am also fine with the change.

efriedma accepted this revision.Apr 24 2020, 1:39 PM
This revision is now accepted and ready to land.Apr 24 2020, 1:39 PM
This revision was automatically updated to reflect the committed changes.

In summary: I think using int is fine as LLVM internally passes an i8

That doesn't seem quite right. Consider, for example, long long a(long long a, int b) { return a << (char)(b); }. To generate correct code, you need to clear the high bits of b. If we just model the argument as an anyext i8, we'll skip clearing those bits, and the compiler-rt C implementations will have undefined behavior. It looks like LLVM does actually extend the value to "int", though. (It doesn't choose between sign/zero extend consistently, but it doesn't actually matter here).

Interesting, where did you find this?
I tried what would happen on AVR, and it doesn't seem to extend the integer. This could be a backend bug, though.
https://llvm.godbolt.org/z/6RgqVi

define i32 @shift8(i32 %a, i8 %b) {
  %b32 = zext i8 %b to i32
  %result = lshr i32 %a, %b32
  ret i32 %result
}
shift8:
        call    __lshrsi3
        ret

In any case, this patch shouldn't have changed any of that as there was only a change from si_int to int, and they were equivalent before.

Interesting, where did you find this?

I was trying with msp430; I had trouble getting a testcase to build with AVR (I forget why). I guess AVR actually behaves differently: it doesn't respect "zeroext" markings on arguments, unlike most targets.

Given that, there's probably a bug we need to fix, in either the AVR backend or type legalization.

uabelho added inline comments.
compiler-rt/lib/builtins/udivmoddi4.c
85

Did you consider changing this call into the new macro ctzsi?

Our downstream target also has ints that are 16 bits wide and we've had some downstream changes somewhat similar to this patch, but then we also changed this call (and another one to builtin_ctz in this file) to builtin_ctzl.

aykevl marked an inline comment as done.May 7 2020, 11:20 AM

I was trying with msp430; I had trouble getting a testcase to build with AVR (I forget why). I guess AVR actually behaves differently: it doesn't respect "zeroext" markings on arguments, unlike most targets.

Given that, there's probably a bug we need to fix, in either the AVR backend or type legalization.

I see. There is definitely a chance that AVR doesn't respect zeroext. I know there are still bugs in the backend that haven't been fixed yet. So far, ABI compatibility bugs haven't been high on my priority list, as long as LLVM-compiled code works with other LLVM-compiled code.

compiler-rt/lib/builtins/udivmoddi4.c
85

I only changed the places that needed changes to get the tests to pass. This file should have been tested together with all other builtins so I think it works on AVR. Of course, if it needs to be changed for your target feel free to send a patch :)

I changed the integer types used in many builtins (si_int, du_int, etc) to specific integer widths. My rationale is that even though the libgcc documentation uses types such as int / long / long long, I believe their bit width is actually much more strictly defined.

When porting compiler-rt builtins to MSP430 and wondering why my locally built clang_rt turns out to differ from libgcc in ABI, I finally read an introduction comment one level above explicitly mentioning:

These routines take arguments and return values of a specific machine mode, not a specific C type. See Machine Modes, for an explanation of this concept. For illustrative purposes, in this chapter the floating point type float is assumed to correspond to SFmode; double to DFmode; and long double to both TFmode and XFmode. Similarly, the integer types int and unsigned int correspond to SImode; long and unsigned long to DImode; and long long and unsigned long long to TImode.