This is an archive of the discontinued LLVM Phabricator instance.

[demangler] PPC and S390: Fix parsing of e-prefixed long double literals
ClosedPublic

Authored by xingxue on Feb 6 2020, 1:36 PM.

Details

Summary

This patch is to fix the parsing of long double literals encoded with the e prefix on PowerPC and S390. For both PowerPC and S390, type code e is used for 64-bit long double literals and g is used for 128-bit long double literals. libcxxabi test case test_demangle.pass.cpp fails without the fix.

Diff Detail

Event Timeline

xingxue created this revision.Feb 6 2020, 1:36 PM
libcxxabi/src/cxa_demangle.cpp
9

As information, this will block building the demangler on AIX with the xlclang++ invocation of the IBM XL compiler.

10

This may need to also guard against building with __LONG_DOUBLE_IEEE128__.

libcxxabi/src/demangle/ItaniumDemangle.h
4242

As information, the status quo on AIX is that e is used for both 64-bit and 128-bit long double.

For S390, type code 'g' is used for 128-bit long double literals

There is also a historical mode of using IEEE-64 as "long double" on s390, where we also used 'e' to mangle it. However, this hasn't been used anywhere for the last 15 years or so. It is still supported via -mlong-double-64 in GCC; I see clang also supports this argument, but I'm not actually sure it works on s390 (I've certainly never tested it).

There is also a historical mode of using IEEE-64 as "long double" on s390, where we also used 'e' to mangle it. However, this hasn't been used anywhere for the last 15 years or so. It is still supported via -mlong-double-64 in GCC; I see clang also supports this argument, but I'm not actually sure it works on s390 (I've certainly never tested it).

This implies the changes for PowerPC here would also apply to s390: The e parsing should go through the double value decoding, etc.

xingxue updated this revision to Diff 243223.Feb 7 2020, 10:36 AM
xingxue edited the summary of this revision. (Show Details)

Addressed comments:

  • Guard against building with __LONG_DOUBLE_IEEE128__ on PowerPC so that it only builds in the double-double 128-bit long double mode.
  • Treat type code 'e' as 64-bit long double on S390 as well.
xingxue marked an inline comment as done.Feb 7 2020, 10:37 AM
thakis added a subscriber: thakis.Feb 10 2020, 10:10 AM

Making this dependent on preprocessor defines means you can only demangle when the target binary matches the host binary. That doesn't seem like a feasible approach.

Making this dependent on preprocessor defines means you can only demangle when the target binary matches the host binary. That doesn't seem like a feasible approach.

The patch here is following existing practice in the codebase. The targets binary versus host binary issue already exists with respect to FloatData<long double>.

@uweigand, Hi, I've addressed your comments. Any further comments?

@uweigand, Hi, I've addressed your comments. Any further comments?

I'm not very familiar with this code base. However, I am somewhat confused by your proposed change to "parseExprPrimary". In particular, where you now parse 'e' literals as "double" on powerpc/s390x, and 'g' literals as "long double" everywhere. This seems incorrect to me.

'e' literals really should be of "long double" *type* always. It's just that on powerpc and s390x, in an old ABI selected via -mlong-double-64, the "long double" type was implemented as IEEE-64 (just like "double", but the type is nevertheless still "long double", not "double").

'g' literals on the other hand really should be of type "long double" only on powerpc and s390x; on some other platforms, in particular x86, they should be of type "__float128" (on yet other platforms, 'g' is not used at all).

But what confuses me even more is how this whole routine is even supposed to work in non-native mode: it just uses the native "double" or "long double" types, but the host implementation of those may be different from the one active on the target (whether this is because of cross-compilation to another target, or simply to another ABI mode like -mlong-double-64 vs. -mlong-double-128). Is this routine only ever to be called natively?

Hi @uweigand, Thanks for your comments. Please see my explanations below.

@uweigand, Hi, I've addressed your comments. Any further comments?

I'm not very familiar with this code base. However, I am somewhat confused by your proposed change to "parseExprPrimary". In particular, where you now parse 'e' literals as "double" on powerpc/s390x, and 'g' literals as "long double" everywhere. This seems incorrect to me.

In mangled names, floating-point literals are encoded using a fixed-length lowercase hexadecimal string corresponding to the internal representation, high-order bytes first. For example, float literal -1.0f is encoded as "fbf800000". For a 64-bit long double literal on powerpc and s390x, the encoded form is type code 'e' followed by 16 hexadecimal digits. For a 128-bit long double literal on powerpc and s390x, the encoded form is type code 'g' followed by 32 hexadecimal digits. So, the proposed the change allows the parser to treat type code 'e' as a double (64-bit) and take the following 16 hexadecimal digits as the internal representation of the literal, instead of treating it as a 128-bit long double and looking for 32 hexadecimal digits after it. When the type code is 'g', the parser will be looking for 32 hexadecimal digits. These are changes for parsing literals in the mangled names.

'e' literals really should be of "long double" *type* always. It's just that on powerpc and s390x, in an old ABI selected via -mlong-double-64, the "long double" type was implemented as IEEE-64 (just like "double", but the type is nevertheless still "long double", not "double").

When printing out the demangled names, it still prints out "long double" for type code 'e' as usual, and "long double" for type code 'g' for powerpc and s390x (see lines 3789-3801 of the code).

'g' literals on the other hand really should be of type "long double" only on powerpc and s390x; on some other platforms, in particular x86, they should be of type "__float128" (on yet other platforms, 'g' is not used at all).

Right, please see lines 3793-3801 of the proposed change.

But what confuses me even more is how this whole routine is even supposed to work in non-native mode: it just uses the native "double" or "long double" types, but the host implementation of those may be different from the one active on the target (whether this is because of cross-compilation to another target, or simply to another ABI mode like -mlong-double-64 vs. -mlong-double-128). Is this routine only ever to be called natively?

__cxa_demangle() is a routine in runtime library libc++abi that is only built natively.

Hi @uweigand, Thanks for your comments. Please see my explanations below.

@uweigand, Hi, I've addressed your comments. Any further comments?

I'm not very familiar with this code base. However, I am somewhat confused by your proposed change to "parseExprPrimary". In particular, where you now parse 'e' literals as "double" on powerpc/s390x, and 'g' literals as "long double" everywhere. This seems incorrect to me.

In mangled names, floating-point literals are encoded using a fixed-length lowercase hexadecimal string corresponding to the internal representation, high-order bytes first. For example, float literal -1.0f is encoded as "fbf800000". For a 64-bit long double literal on powerpc and s390x, the encoded form is type code 'e' followed by 16 hexadecimal digits. For a 128-bit long double literal on powerpc and s390x, the encoded form is type code 'g' followed by 32 hexadecimal digits. So, the proposed the change allows the parser to treat type code 'e' as a double (64-bit) and take the following 16 hexadecimal digits as the internal representation of the literal, instead of treating it as a 128-bit long double and looking for 32 hexadecimal digits after it. When the type code is 'g', the parser will be looking for 32 hexadecimal digits. These are changes for parsing literals in the mangled names.

So I guess the point I was missing is that "treat as double" / "treat as long double" here simply means to decode the byte sequence as if it were a native value of that type (in default compilation mode). Then the change for powerpc and s390x does indeed look correct to me.

'e' literals really should be of "long double" *type* always. It's just that on powerpc and s390x, in an old ABI selected via -mlong-double-64, the "long double" type was implemented as IEEE-64 (just like "double", but the type is nevertheless still "long double", not "double").

When printing out the demangled names, it still prints out "long double" for type code 'e' as usual, and "long double" for type code 'g' for powerpc and s390x (see lines 3789-3801 of the code).

OK.

'g' literals on the other hand really should be of type "long double" only on powerpc and s390x; on some other platforms, in particular x86, they should be of type "__float128" (on yet other platforms, 'g' is not used at all).

Right, please see lines 3793-3801 of the proposed change.

I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?

On the other hand, if 'g' encoded literals can never happen on Intel (or other platforms), maybe it would be better to have the code handling 'g' within a #ifdef section only active on powerpc and s390?

But what confuses me even more is how this whole routine is even supposed to work in non-native mode: it just uses the native "double" or "long double" types, but the host implementation of those may be different from the one active on the target (whether this is because of cross-compilation to another target, or simply to another ABI mode like -mlong-double-64 vs. -mlong-double-128). Is this routine only ever to be called natively?

__cxa_demangle() is a routine in runtime library libc++abi that is only built natively.

OK, got it. Even more so, I guess we must also ensure that it is only built using the default compiler setting (e.g. it is built with the default -mlong-double-128 on s390x, not with -mlong-double-64). But that's probably a reasonable assumption.

I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?

A literal such as 1.Q would appear as g3fff0000000000000000000000000000. As would 1.L with -mlong-double-128.

__cxa_demangle() is a routine in runtime library libc++abi that is only built natively.

OK, got it. Even more so, I guess we must also ensure that it is only built using the default compiler setting (e.g. it is built with the default -mlong-double-128 on s390x, not with -mlong-double-64). But that's probably a reasonable assumption.

The patch already adds a preprocessor check for that on Power. I've added an inline comment to do the same for s390x.

libcxxabi/src/cxa_demangle.cpp
11

A similar error for building with 64-bit long double on s390x would be appropriate.

I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?

On the other hand, if 'g' encoded literals can never happen on Intel (or other platforms), maybe it would be better to have the code handling 'g' within a #ifdef section only active on powerpc and s390?

For X86, 'e' is used for 80-bit long double and 'g' is used for 128-bit long double. The following is the code in Clang.

clang/lib/Basic/Targets/X86.h
....
const char *getLongDoubleMangling() const override {
  return LongDoubleFormat == &llvm::APFloat::IEEEquad() ? "g" : "e";
}
...

I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?

On the other hand, if 'g' encoded literals can never happen on Intel (or other platforms), maybe it would be better to have the code handling 'g' within a #ifdef section only active on powerpc and s390?

For X86, 'e' is used for 80-bit long double and 'g' is used for 128-bit long double. The following is the code in Clang.

clang/lib/Basic/Targets/X86.h
....
const char *getLongDoubleMangling() const override {
  return LongDoubleFormat == &llvm::APFloat::IEEEquad() ? "g" : "e";
}
...

But then this patch must be incorrect, given that it does

return getDerived().template parseFloatingLiteral<long double>();

for both the 'e' and 'g' cases on Intel. Now I guess it depends on how the file is compiled (with -mlong-double-80 or -mlong-double-128), but either way, one of the cases will be handled incorrectly.

But then this patch must be incorrect, given that it does

return getDerived().template parseFloatingLiteral<long double>();

for both the 'e' and 'g' cases on Intel. Now I guess it depends on how the file is compiled (with -mlong-double-80 or -mlong-double-128), but either way, one of the cases will be handled incorrectly.

Agreed. The addition of the 'g' handling for cases other than the specific platforms mentioned by the title for the patch is outside the intended scope and should probably be avoided.

I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?

On the other hand, if 'g' encoded literals can never happen on Intel (or other platforms), maybe it would be better to have the code handling 'g' within a #ifdef section only active on powerpc and s390?

For X86, 'e' is used for 80-bit long double and 'g' is used for 128-bit long double. The following is the code in Clang.

clang/lib/Basic/Targets/X86.h
....
const char *getLongDoubleMangling() const override {
  return LongDoubleFormat == &llvm::APFloat::IEEEquad() ? "g" : "e";
}
...

But then this patch must be incorrect, given that it does

return getDerived().template parseFloatingLiteral<long double>();

for both the 'e' and 'g' cases on Intel. Now I guess it depends on how the file is compiled (with -mlong-double-80 or -mlong-double-128), but either way, one of the cases will be handled incorrectly.

But then this patch must be incorrect, given that it does

return getDerived().template parseFloatingLiteral<long double>();

for both the 'e' and 'g' cases on Intel. Now I guess it depends on how the file is compiled (with -mlong-double-80 or -mlong-double-128), but either way, one of the cases will be handled incorrectly.

Agreed. The addition of the 'g' handling for cases other than the specific platforms mentioned by the title for the patch is outside the intended scope and should probably be avoided.

Agreed.

xingxue updated this revision to Diff 246484.EditedFeb 25 2020, 8:41 AM

Addressed comments:

  • Added guard on S390 to only build in 128-bit long double mode.
  • Only recognize 'g' for parsing for powerpc and s390.
xingxue marked an inline comment as done.Feb 25 2020, 8:43 AM

It seems demangling g as __float128 (and not trying to interpret its value) is to be expected. Setting up e to handle 64-bit long double should be sufficient. I suggest to remove changing the handling of g from this patch.

libcxxabi/src/cxa_demangle.cpp
14

None of these restrictions are necessary if we stick to the established handling of g, which prints the raw byte string in hex. Yes, it is unfortunate that it displays as __float128 on Power, especially since the GCC compiler's __float128 on Power (requiring -mfloat128) mangles as something else. Dealing with that is another can of worms that might need coordination with the GCC folks.

libcxxabi/src/demangle/ItaniumDemangle.h
3793

I do not believe this change is necessary at this time. On powerpc64le-linux-gnu even with -mlong-double-128,
_Z1fIiEvP1AIXszplLg00000000000000004000000000000000EcvT__EEE
demangles using
__cxa_demangle picked up using -lsupc++
as
void f<int>(A<sizeof (((__float128)[00000000000000004000000000000000])+((int)()))>*).

Similarly, the same with
_Z1fIiEvP1AIXszplLg40000000000000000000000000000000EcvT__EEE
on s390x-linux-gnu gives
void f<int>(A<sizeof (((__float128)[40000000000000000000000000000000])+((int)()))>*).

4236

I'm going to suggest leaving g alone.

5185

If we aren't changing g to go here, then we don't need the change here.

5192

This is not going to be long enough for IBM double-double, which can have a large gap between the high and the low doubles.

Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2020, 12:39 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
xingxue updated this revision to Diff 257053.Apr 13 2020, 12:00 PM

Addressed comments:

  • Leave the existing handling of type code g unchanged as suggested.
xingxue marked 7 inline comments as done.Apr 13 2020, 12:09 PM
xingxue added inline comments.
libcxxabi/src/demangle/ItaniumDemangle.h
3793

The change is not needed with the handling of type code g unchanged as suggested.

4236

Agreed.

5192

Leave it unchanged for now since the current handling of IBM double-double prints the raw byte string in hex.

xingxue marked 3 inline comments as done.Apr 13 2020, 1:16 PM
hubert.reinterpretcast retitled this revision from [demangler] Fix the parsing of long double literals for PowerPC and S390 to [demangler] PPC and S390: Fix parsing of e-prefixed long double literals.Apr 13 2020, 1:30 PM
hubert.reinterpretcast edited the summary of this revision. (Show Details)

LGTM with a minor comment.

libcxxabi/src/demangle/ItaniumDemangle.h
4243

Suggestion for a comment:
Handle cases where long doubles encoded with e have the same size and representation as doubles.

xingxue updated this revision to Diff 257114.Apr 13 2020, 2:30 PM

Addressed comments:

  • Added a comment as suggested.
xingxue marked an inline comment as done.Apr 13 2020, 3:22 PM
ldionne removed a reviewer: Restricted Project. ldionne added 1 blocking reviewer(s): erik.pilkington.Apr 14 2020, 5:05 AM
ldionne added a subscriber: ldionne.

I'll defer to @erik.pilkington.

Herald added a reviewer: Restricted Project. · View Herald TranscriptApr 14 2020, 5:05 AM
erik.pilkington accepted this revision.Apr 14 2020, 2:56 PM

Please also commit this to llvm's copy of the demangler. You can copy the change over by running the cp-to-llvm.sh script.

This revision is now accepted and ready to land.Apr 14 2020, 2:56 PM
This revision was automatically updated to reflect the committed changes.