Download Raw Diff

Details

Reviewers

hubert.reinterpretcast
jasonliu
erik.pilkington
uweigand
mclow.lists

Group Reviewers

Restricted Project

Commits

rG4578fa8a1cc3: [demangler] PPC and S390: Fix parsing of e-prefixed long double literals

Summary

This patch is to fix the parsing of long double literals encoded with the e prefix on PowerPC and S390. For both PowerPC and S390, type code e is used for 64-bit long double literals and g is used for 128-bit long double literals. libcxxabi test case test_demangle.pass.cpp fails without the fix.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

xingxue created this revision.Feb 6 2020, 1:36 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2020, 1:36 PM

Herald added subscribers: libcxx-commits, steven.zhang, shchenz and 5 others. · View Herald Transcript

hubert.reinterpretcast added inline comments.Feb 6 2020, 2:18 PM

libcxxabi/src/cxa_demangle.cpp
9 ↗	(On Diff #242989)	As information, this will block building the demangler on AIX with the `xlclang++` invocation of the IBM XL compiler.
10 ↗	(On Diff #242989)	This may need to also guard against building with `__LONG_DOUBLE_IEEE128__`.
libcxxabi/src/demangle/ItaniumDemangle.h
4228	As information, the status quo on AIX is that `e` is used for both 64-bit and 128-bit long double.

Herald added a reviewer: mclow.lists. · View Herald TranscriptFeb 6 2020, 2:18 PM

Herald added a subscriber: • wuzish. · View Herald Transcript

For S390, type code 'g' is used for 128-bit long double literals

There is also a historical mode of using IEEE-64 as "long double" on s390, where we also used 'e' to mangle it. However, this hasn't been used anywhere for the last 15 years or so. It is still supported via -mlong-double-64 in GCC; I see clang also supports this argument, but I'm not actually sure it works on s390 (I've certainly never tested it).

In D74163#1863842, @uweigand wrote:

There is also a historical mode of using IEEE-64 as "long double" on s390, where we also used 'e' to mangle it. However, this hasn't been used anywhere for the last 15 years or so. It is still supported via -mlong-double-64 in GCC; I see clang also supports this argument, but I'm not actually sure it works on s390 (I've certainly never tested it).

This implies the changes for PowerPC here would also apply to s390: The e parsing should go through the double value decoding, etc.

Addressed comments:

Guard against building with __LONG_DOUBLE_IEEE128__ on PowerPC so that it only builds in the double-double 128-bit long double mode.
Treat type code 'e' as 64-bit long double on S390 as well.

xingxue marked an inline comment as done.Feb 7 2020, 10:37 AM

Making this dependent on preprocessor defines means you can only demangle when the target binary matches the host binary. That doesn't seem like a feasible approach.

In D74163#1867589, @thakis wrote:

Making this dependent on preprocessor defines means you can only demangle when the target binary matches the host binary. That doesn't seem like a feasible approach.

The patch here is following existing practice in the codebase. The targets binary versus host binary issue already exists with respect to FloatData<long double>.

@uweigand, Hi, I've addressed your comments. Any further comments?

In D74163#1887301, @xingxue wrote:

@uweigand, Hi, I've addressed your comments. Any further comments?

I'm not very familiar with this code base. However, I am somewhat confused by your proposed change to "parseExprPrimary". In particular, where you now parse 'e' literals as "double" on powerpc/s390x, and 'g' literals as "long double" everywhere. This seems incorrect to me.

'e' literals really should be of "long double" *type* always. It's just that on powerpc and s390x, in an old ABI selected via -mlong-double-64, the "long double" type was implemented as IEEE-64 (just like "double", but the type is nevertheless still "long double", not "double").

'g' literals on the other hand really should be of type "long double" only on powerpc and s390x; on some other platforms, in particular x86, they should be of type "__float128" (on yet other platforms, 'g' is not used at all).

But what confuses me even more is how this whole routine is even supposed to work in non-native mode: it just uses the native "double" or "long double" types, but the host implementation of those may be different from the one active on the target (whether this is because of cross-compilation to another target, or simply to another ABI mode like -mlong-double-64 vs. -mlong-double-128). Is this routine only ever to be called natively?

Hi @uweigand, Thanks for your comments. Please see my explanations below.

In D74163#1887981, @uweigand wrote:

In D74163#1887301, @xingxue wrote:

@uweigand, Hi, I've addressed your comments. Any further comments?

I'm not very familiar with this code base. However, I am somewhat confused by your proposed change to "parseExprPrimary". In particular, where you now parse 'e' literals as "double" on powerpc/s390x, and 'g' literals as "long double" everywhere. This seems incorrect to me.

In mangled names, floating-point literals are encoded using a fixed-length lowercase hexadecimal string corresponding to the internal representation, high-order bytes first. For example, float literal -1.0f is encoded as "fbf800000". For a 64-bit long double literal on powerpc and s390x, the encoded form is type code 'e' followed by 16 hexadecimal digits. For a 128-bit long double literal on powerpc and s390x, the encoded form is type code 'g' followed by 32 hexadecimal digits. So, the proposed the change allows the parser to treat type code 'e' as a double (64-bit) and take the following 16 hexadecimal digits as the internal representation of the literal, instead of treating it as a 128-bit long double and looking for 32 hexadecimal digits after it. When the type code is 'g', the parser will be looking for 32 hexadecimal digits. These are changes for parsing literals in the mangled names.

'e' literals really should be of "long double" *type* always. It's just that on powerpc and s390x, in an old ABI selected via -mlong-double-64, the "long double" type was implemented as IEEE-64 (just like "double", but the type is nevertheless still "long double", not "double").

When printing out the demangled names, it still prints out "long double" for type code 'e' as usual, and "long double" for type code 'g' for powerpc and s390x (see lines 3789-3801 of the code).

'g' literals on the other hand really should be of type "long double" only on powerpc and s390x; on some other platforms, in particular x86, they should be of type "__float128" (on yet other platforms, 'g' is not used at all).

Right, please see lines 3793-3801 of the proposed change.

But what confuses me even more is how this whole routine is even supposed to work in non-native mode: it just uses the native "double" or "long double" types, but the host implementation of those may be different from the one active on the target (whether this is because of cross-compilation to another target, or simply to another ABI mode like -mlong-double-64 vs. -mlong-double-128). Is this routine only ever to be called natively?

__cxa_demangle() is a routine in runtime library libc++abi that is only built natively.

In D74163#1890160, @xingxue wrote:

Hi @uweigand, Thanks for your comments. Please see my explanations below.

In D74163#1887981, @uweigand wrote:

In D74163#1887301, @xingxue wrote:

@uweigand, Hi, I've addressed your comments. Any further comments?

I'm not very familiar with this code base. However, I am somewhat confused by your proposed change to "parseExprPrimary". In particular, where you now parse 'e' literals as "double" on powerpc/s390x, and 'g' literals as "long double" everywhere. This seems incorrect to me.

In mangled names, floating-point literals are encoded using a fixed-length lowercase hexadecimal string corresponding to the internal representation, high-order bytes first. For example, float literal -1.0f is encoded as "fbf800000". For a 64-bit long double literal on powerpc and s390x, the encoded form is type code 'e' followed by 16 hexadecimal digits. For a 128-bit long double literal on powerpc and s390x, the encoded form is type code 'g' followed by 32 hexadecimal digits. So, the proposed the change allows the parser to treat type code 'e' as a double (64-bit) and take the following 16 hexadecimal digits as the internal representation of the literal, instead of treating it as a 128-bit long double and looking for 32 hexadecimal digits after it. When the type code is 'g', the parser will be looking for 32 hexadecimal digits. These are changes for parsing literals in the mangled names.

So I guess the point I was missing is that "treat as double" / "treat as long double" here simply means to decode the byte sequence as if it were a native value of that type (in default compilation mode). Then the change for powerpc and s390x does indeed look correct to me.

'e' literals really should be of "long double" *type* always. It's just that on powerpc and s390x, in an old ABI selected via -mlong-double-64, the "long double" type was implemented as IEEE-64 (just like "double", but the type is nevertheless still "long double", not "double").

When printing out the demangled names, it still prints out "long double" for type code 'e' as usual, and "long double" for type code 'g' for powerpc and s390x (see lines 3789-3801 of the code).

OK.

'g' literals on the other hand really should be of type "long double" only on powerpc and s390x; on some other platforms, in particular x86, they should be of type "__float128" (on yet other platforms, 'g' is not used at all).

Right, please see lines 3793-3801 of the proposed change.

I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?

On the other hand, if 'g' encoded literals can never happen on Intel (or other platforms), maybe it would be better to have the code handling 'g' within a #ifdef section only active on powerpc and s390?

But what confuses me even more is how this whole routine is even supposed to work in non-native mode: it just uses the native "double" or "long double" types, but the host implementation of those may be different from the one active on the target (whether this is because of cross-compilation to another target, or simply to another ABI mode like -mlong-double-64 vs. -mlong-double-128). Is this routine only ever to be called natively?

__cxa_demangle() is a routine in runtime library libc++abi that is only built natively.

OK, got it. Even more so, I guess we must also ensure that it is only built using the default compiler setting (e.g. it is built with the default -mlong-double-128 on s390x, not with -mlong-double-64). But that's probably a reasonable assumption.

In D74163#1891047, @uweigand wrote:

I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?

A literal such as 1.Q would appear as g3fff0000000000000000000000000000. As would 1.L with -mlong-double-128.

__cxa_demangle() is a routine in runtime library libc++abi that is only built natively.

OK, got it. Even more so, I guess we must also ensure that it is only built using the default compiler setting (e.g. it is built with the default -mlong-double-128 on s390x, not with -mlong-double-64). But that's probably a reasonable assumption.

The patch already adds a preprocessor check for that on Power. I've added an inline comment to do the same for s390x.

libcxxabi/src/cxa_demangle.cpp
11 ↗	(On Diff #243223)	A similar error for building with 64-bit long double on s390x would be appropriate.

I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?

On the other hand, if 'g' encoded literals can never happen on Intel (or other platforms), maybe it would be better to have the code handling 'g' within a #ifdef section only active on powerpc and s390?

For X86, 'e' is used for 80-bit long double and 'g' is used for 128-bit long double. The following is the code in Clang.

clang/lib/Basic/Targets/X86.h
....
const char *getLongDoubleMangling() const override {
  return LongDoubleFormat == &llvm::APFloat::IEEEquad() ? "g" : "e";
}
...

In D74163#1891378, @xingxue wrote:
I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?

On the other hand, if 'g' encoded literals can never happen on Intel (or other platforms), maybe it would be better to have the code handling 'g' within a #ifdef section only active on powerpc and s390?

For X86, 'e' is used for 80-bit long double and 'g' is used for 128-bit long double. The following is the code in Clang.
clang/lib/Basic/Targets/X86.h
....
const char *getLongDoubleMangling() const override {
  return LongDoubleFormat == &llvm::APFloat::IEEEquad() ? "g" : "e";
}
...

But then this patch must be incorrect, given that it does

return getDerived().template parseFloatingLiteral<long double>();

for both the 'e' and 'g' cases on Intel. Now I guess it depends on how the file is compiled (with -mlong-double-80 or -mlong-double-128), but either way, one of the cases will be handled incorrectly.

In D74163#1891408, @uweigand wrote:
But then this patch must be incorrect, given that it does
return getDerived().template parseFloatingLiteral<long double>();
for both the 'e' and 'g' cases on Intel. Now I guess it depends on how the file is compiled (with -mlong-double-80 or -mlong-double-128), but either way, one of the cases will be handled incorrectly.

Agreed. The addition of the 'g' handling for cases other than the specific platforms mentioned by the title for the patch is outside the intended scope and should probably be avoided.

In D74163#1891408, @uweigand wrote:
In D74163#1891378, @xingxue wrote:
I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?

On the other hand, if 'g' encoded literals can never happen on Intel (or other platforms), maybe it would be better to have the code handling 'g' within a #ifdef section only active on powerpc and s390?

For X86, 'e' is used for 80-bit long double and 'g' is used for 128-bit long double. The following is the code in Clang.
clang/lib/Basic/Targets/X86.h
....
const char *getLongDoubleMangling() const override {
  return LongDoubleFormat == &llvm::APFloat::IEEEquad() ? "g" : "e";
}
...
But then this patch must be incorrect, given that it does
return getDerived().template parseFloatingLiteral<long double>();
for both the 'e' and 'g' cases on Intel. Now I guess it depends on how the file is compiled (with -mlong-double-80 or -mlong-double-128), but either way, one of the cases will be handled incorrectly.

In D74163#1891476, @hubert.reinterpretcast wrote:
In D74163#1891408, @uweigand wrote:
But then this patch must be incorrect, given that it does
return getDerived().template parseFloatingLiteral<long double>();
for both the 'e' and 'g' cases on Intel. Now I guess it depends on how the file is compiled (with -mlong-double-80 or -mlong-double-128), but either way, one of the cases will be handled incorrectly.
Agreed. The addition of the 'g' handling for cases other than the specific platforms mentioned by the title for the patch is outside the intended scope and should probably be avoided.

Agreed.

Addressed comments:

Added guard on S390 to only build in 128-bit long double mode.
Only recognize 'g' for parsing for powerpc and s390.

xingxue marked an inline comment as done.Feb 25 2020, 8:43 AM

It seems demangling g as __float128 (and not trying to interpret its value) is to be expected. Setting up e to handle 64-bit long double should be sufficient. I suggest to remove changing the handling of g from this patch.

libcxxabi/src/cxa_demangle.cpp
14 ↗	(On Diff #246484)	None of these restrictions are necessary if we stick to the established handling of `g`, which prints the raw byte string in hex. Yes, it is unfortunate that it displays as `__float128` on Power, especially since the GCC compiler's `__float128` on Power (requiring `-mfloat128`) mangles as something else. Dealing with that is another can of worms that might need coordination with the GCC folks.
libcxxabi/src/demangle/ItaniumDemangle.h
3793	I do not believe this change is necessary at this time. On `powerpc64le-linux-gnu` even with `-mlong-double-128`, `_Z1fIiEvP1AIXszplLg00000000000000004000000000000000EcvT__EEE` demangles using `__cxa_demangle` picked up using `-lsupc++` as `void f<int>(A<sizeof (((__float128)[00000000000000004000000000000000])+((int)()))>)`. Similarly, the same with `_Z1fIiEvP1AIXszplLg40000000000000000000000000000000EcvT__EEE` on `s390x-linux-gnu` gives `void f<int>(A<sizeof (((__float128)[40000000000000000000000000000000])+((int)()))>)`.
4230	I'm going to suggest leaving `g` alone.
5174	If we aren't changing `g` to go here, then we don't need the change here.
5181	This is not going to be long enough for IBM double-double, which can have a large gap between the high and the low doubles.

Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2020, 12:39 PM

Herald added a reviewer: Restricted Project. · View Herald Transcript

Addressed comments:

Leave the existing handling of type code g unchanged as suggested.

xingxue marked 7 inline comments as done.Apr 13 2020, 12:09 PM

xingxue added inline comments.

libcxxabi/src/demangle/ItaniumDemangle.h
3793	The change is not needed with the handling of type code `g` unchanged as suggested.
4230	Agreed.
5181	Leave it unchanged for now since the current handling of IBM double-double prints the raw byte string in hex.

Harbormaster failed remote builds in B52967: Diff 257053!Apr 13 2020, 1:00 PM

xingxue marked 3 inline comments as done.Apr 13 2020, 1:16 PM

hubert.reinterpretcast retitled this revision from [demangler] Fix the parsing of long double literals for PowerPC and S390 to [demangler] PPC and S390: Fix parsing of e-prefixed long double literals.Apr 13 2020, 1:30 PM

hubert.reinterpretcast edited the summary of this revision. (Show Details)

LGTM with a minor comment.

libcxxabi/src/demangle/ItaniumDemangle.h
4229	Suggestion for a comment: Handle cases where `long double`s encoded with `e` have the same size and representation as `double`s.

Addressed comments:

Added a comment as suggested.

Harbormaster failed remote builds in B53005: Diff 257114!Apr 13 2020, 3:15 PM

xingxue marked an inline comment as done.Apr 13 2020, 3:22 PM

I'll defer to @erik.pilkington.

Herald added a reviewer: Restricted Project. · View Herald TranscriptApr 14 2020, 5:05 AM

Please also commit this to llvm's copy of the demangler. You can copy the change over by running the cp-to-llvm.sh script.

This revision is now accepted and ready to land.Apr 14 2020, 2:56 PM

Closed by commit rG4578fa8a1cc3: [demangler] PPC and S390: Fix parsing of e-prefixed long double literals (authored by xingxue). · Explain WhyApr 15 2020, 7:07 AM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

[demangler] PPC and S390: Fix parsing of e-prefixed long double literals
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 257053

libcxxabi/src/demangle/ItaniumDemangle.h

This is an archive of the discontinued LLVM Phabricator instance.

[demangler] PPC and S390: Fix parsing of e-prefixed long double literalsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 257053

libcxxabi/src/demangle/ItaniumDemangle.h

[demangler] PPC and S390: Fix parsing of e-prefixed long double literals
ClosedPublic