This is an archive of the discontinued LLVM Phabricator instance.

[Target][AArch64] Allow for char as int8_t in AArch64AsmParser.cpp
ClosedPublic

Authored by ro on Aug 4 2020, 10:02 AM.

Details

Summary

A couple of AArch64 tests were failing on Solaris, both sparc and x86:

LLVM :: MC/AArch64/SVE/add-diagnostics.s
LLVM :: MC/AArch64/SVE/cpy-diagnostics.s
LLVM :: MC/AArch64/SVE/cpy.s
LLVM :: MC/AArch64/SVE/dup-diagnostics.s
LLVM :: MC/AArch64/SVE/dup.s
LLVM :: MC/AArch64/SVE/mov-diagnostics.s
LLVM :: MC/AArch64/SVE/mov.s
LLVM :: MC/AArch64/SVE/sqadd-diagnostics.s
LLVM :: MC/AArch64/SVE/sqsub-diagnostics.s
LLVM :: MC/AArch64/SVE/sub-diagnostics.s
LLVM :: MC/AArch64/SVE/subr-diagnostics.s
LLVM :: MC/AArch64/SVE/uqadd-diagnostics.s
LLVM :: MC/AArch64/SVE/uqsub-diagnostics.s

For example, reduced from MC/AArch64/SVE/add-diagnostics.s:

add     z0.b, z0.b, #0, lsl #8

missed the expected diagnostics

$ ./bin/llvm-mc -triple=aarch64 -show-encoding -mattr=+sve add.s
add.s:1:21: error: immediate must be an integer in range [0, 255] with a shift amount of 0
add     z0.b, z0.b, #0, lsl #8
                    ^

The message is Match_InvalidSVEAddSubImm8, emitted in the generated
lib/Target/AArch64/AArch64GenAsmMatcher.inc for MCK_SVEAddSubImm8.
When comparing the call to ::AArch64Operand::isSVEAddSubImm<char> on both
Linux/x86_64 and Solaris, I find

875	    bool IsByte = std::is_same<int8_t, std::make_signed_t<T>>::value;

is false on Solaris, unlike Linux.

The problem boils down to the fact that int8_t is plain char on Solaris: both the sparc
and i386 psABIs have char as signed. However, with

9887	    DiagnosticPredicate DP(Operand.isSVEAddSubImm<int8_t>());

in lib/Target/AArch64/AArch64GenAsmMatcher.inc, std::make_signed_t<int8_t>
above yieds signed char, so std::is_same<int8_t, char> is false.

This can easily be fixed by also allowing for signed char here and in a few similar places.

Tested on amd64-pc-solaris2.11, sparcv9-sun-solaris2.11, and x86_64-pc-linux-gnu.

Diff Detail

Event Timeline

ro created this revision.Aug 4 2020, 10:02 AM
ro requested review of this revision.Aug 4 2020, 10:02 AM
ro added a comment.Aug 11 2020, 1:22 AM

Ping? It's been a week.

paulwalker-arm added inline comments.Aug 18 2020, 6:10 AM
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
859

Perhaps a naive question but are there any concerns about "signed char" being bigger than a byte on any systems?

The commit message suggests its the make_signed that is the problem so I'm wondering if instead adding "std::is_same<int8_t, T>::value" would also work?

ro added inline comments.Aug 18 2020, 7:11 AM
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
859

Perhaps a naive question but are there any concerns about "signed char" being bigger than a byte on any systems?

There shouldn't be: sizeof(char/unsigned char/signed char) is guaranteed to be 1 by C11, 6.5.3.4.

The commit message suggests its the make_signed that is the problem so I'm wondering if instead adding "std::is_same<int8_t, T>::value" would also work?

At least if have found no case where T wasn't a signed type. OTOH maybe s simple sizeof(T) == 1 is enough?

paulwalker-arm accepted this revision.Aug 18 2020, 8:33 AM

I have no real objections to this patch other than I prefer:

bool IsByte = std::is_same<int8_t, std::make_signed_t<T>>::value ||
              std::is_same<int8_t, T>::value;

if that works for your use case, if not then I'm happy with your current solution. Please can you give others a day or so longer to intervene in case they have a better suggestion.

This revision is now accepted and ready to land.Aug 18 2020, 8:33 AM
jyknight added inline comments.Aug 18 2020, 8:52 AM
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
859

+1 to sizeof(T) == 1, that's a lot better.

ro updated this revision to Diff 286395.Aug 18 2020, 2:07 PM

Just check sizeof(T) == 1.

ro added inline comments.Aug 18 2020, 2:09 PM
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
859

+1 to sizeof(T) == 1, that's a lot better.

Fully agreed. The updated patch has been tested on x86_64-pc-linux-gnu (2-stage build), sparcv9-sun-solaris2.11, and amd64-pc-linux-gnu (1-stage build with gcc 10).

jyknight added inline comments.Aug 18 2020, 2:49 PM
llvm/lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h
769

sizeof(T) == 2 for consistency?

The problem boils down to the fact that int8_t is plain char on Solaris: both the sparc and i386 psABIs have char as signed. However, with ...

So, is -funsigned-char unsupported on Solaris?

paulwalker-arm requested changes to this revision.Aug 19 2020, 2:09 AM

If the consensus is to accept sizeof(T)==1 then so be it, but personally it doesn't sit well with me.

llvm/lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h
769

sizeof(float16_t) == 2, which nicely highlights why I don't like the sizeof approach. If not already, someday somebody we'll want to support a 1 byte type that isn't an integer.

This revision now requires changes to proceed.Aug 19 2020, 2:09 AM
ro updated this revision to Diff 286510.Aug 19 2020, 2:10 AM

Use sizeof(T) == 2 check, too.
Update variable name.

ro marked an inline comment as done.Aug 19 2020, 2:14 AM
ro added inline comments.
llvm/lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h
778

Renamed IsInt8t: it no longer matches the check and according to my testing signedness didn't matter here.

ro added a comment.Aug 19 2020, 2:26 AM

The problem boils down to the fact that int8_t is plain char on Solaris: both the sparc and i386 psABIs have char as signed. However, with ...

So, is -funsigned-char unsupported on Solaris?

Depends on your definition of "supported", I'd say. The option does exactly what it says. If you only apply it to your own code without interacting with system code (libc etc.), you should be ok. However, as with other ABI-violating options, once you start using system code which by definition must follow the ABI, all bets are off.

Studio 12.6 cc(1) puts it nicely for its equivalent -xchar option:

The -xchar option changes the range of values for the type char only for code compiled with -xchar. This option does not change the range of values for type char in any system routine or header file. In particular, the value of CHAR_MAX and CHAR_MIN, as defined by limits.h, do not change when this option is specified. Therefore, CHAR_MAX and CHAR_MIN no longer represent the range of values encodable in a plain char.

If you use -xchar, be particularly careful when you compare a char against a predefined system macro because the value in the macro may be signed. This is most common for any routine that returns an error code which is accessed through a macro. Error codes are typically negative values so when you compare a char against the value from such a macro, the result is always false. A negative number can never be equal to any value of an unsigned type.

It is strongly recommended that you never use -xchar to compile routines for any interface exported through a library. The Oracle Solaris ABI specifies type char as signed, and system libraries behave accordingly. The effect of making char unsigned has not been extensively tested with system libraries. Instead of using this option, modify your code that it does not depend on whether type char is signed or unsigned. The sign of type char varies among compilers and operating systems.
ro added a comment.Aug 19 2020, 8:44 AM

I have no real objections to this patch other than I prefer:

bool IsByte = std::is_same<int8_t, std::make_signed_t<T>>::value ||
              std::is_same<int8_t, T>::value;

if that works for your use case, if not then I'm happy with your current solution. Please can you give others a day or so longer to intervene in case they have a better suggestion.

I've now tested that variant, too, as before: works just as well.

I'm fine with either solution (although I prefer the sizeof one since it's way shorter/easier to read).

Just let me know when you've reached consensus.

jyknight added inline comments.Aug 24 2020, 1:51 PM
llvm/lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h
769

But nobody should call these functions with T not being an integral type in the first place. It wouldn't make sense today or in the future, for what these functions do.

Actually, I'm not sure why these are even functions templated on a integer type T, rather than just passing a second "int ImmBytes" argument. At first glance, that would actually seem clearer all around.

ro updated this revision to Diff 288621.Aug 28 2020, 8:50 AM

Non-template version, passing ImmBytes around.

ro added a comment.Aug 28 2020, 8:54 AM

We've got 3 alternative versions now:

  • my original one as simplified by Paul's suggestion (hadn't uploaded that, but certainly could recreate/retest if that's the selected candidate) Diff 1 in the History
  • the sizeof version, Diff 3 in the History
  • the (partial) non-template version, Diff 4 in the history

I guess it's decision time now ;-)

llvm/lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h
769

I've just uploaded a version that does that, tested on sparcv9-sun-solaris2.11 and x86_64-pc-linux-gnu.

My vote hasn't changed, I mean I already accepted the tweaked Option1 so you could have been done and dusted by now :) I just don't see what's gained from Option3's refactoring. It does the same as Option1 but in a less type safe/flexible/c++ way. Furthermore, the new interface is needlessly different to the other functions that handle SVE immediate values when the vector element type plays a role.

@ro Is it worth specifying an end date for votes/opinions?

ro added a comment.Aug 28 2020, 3:24 PM

My vote hasn't changed, I mean I already accepted the tweaked Option1 so you could have been done and dusted by now :) I just don't see what's gained from Option3's refactoring. It does the same as Option1 but in a less type safe/flexible/c++ way. Furthermore, the new interface is needlessly different to the other functions that handle SVE immediate values when the vector element type plays a role.

When differerent reviewers expressed different preferences, I guess I expected they would reach some agreement among themselves. But I'm certainly not keen on dragging this along much longer, let alone try yet another variant ;-) Not being much of a C++ programmer, I can't say much one way or another myself. So I guess I'll give it another day or two for others to chime in and then commit the tweaked Option 1 to finally be done with this.

ro: In this case, I would say to defer to Paul, although I wish that he would change his mind.

ro updated this revision to Diff 288727.Aug 28 2020, 4:23 PM

Actual Option 1 variant.

ro added a comment.Aug 28 2020, 4:24 PM

ro: In this case, I would say to defer to Paul, although I wish that he would change his mind.

Ok, I'll commit the just attached version once testing to guard against stupid typos has completed.

Thanks.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 29 2020, 1:01 AM
This revision was automatically updated to reflect the committed changes.