This patch is to fix the parsing of long double literals encoded with the e prefix on PowerPC and S390. For both PowerPC and S390, type code e is used for 64-bit long double literals and g is used for 128-bit long double literals. libcxxabi test case test_demangle.pass.cpp fails without the fix.
Details
- Reviewers
hubert.reinterpretcast jasonliu erik.pilkington uweigand mclow.lists - Group Reviewers
Restricted Project - Commits
- rG4578fa8a1cc3: [demangler] PPC and S390: Fix parsing of e-prefixed long double literals
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
libcxxabi/src/cxa_demangle.cpp | ||
---|---|---|
9 | As information, this will block building the demangler on AIX with the xlclang++ invocation of the IBM XL compiler. | |
10 | This may need to also guard against building with __LONG_DOUBLE_IEEE128__. | |
libcxxabi/src/demangle/ItaniumDemangle.h | ||
4235 | As information, the status quo on AIX is that e is used for both 64-bit and 128-bit long double. |
For S390, type code 'g' is used for 128-bit long double literals
There is also a historical mode of using IEEE-64 as "long double" on s390, where we also used 'e' to mangle it. However, this hasn't been used anywhere for the last 15 years or so. It is still supported via -mlong-double-64 in GCC; I see clang also supports this argument, but I'm not actually sure it works on s390 (I've certainly never tested it).
This implies the changes for PowerPC here would also apply to s390: The e parsing should go through the double value decoding, etc.
Addressed comments:
- Guard against building with __LONG_DOUBLE_IEEE128__ on PowerPC so that it only builds in the double-double 128-bit long double mode.
- Treat type code 'e' as 64-bit long double on S390 as well.
Making this dependent on preprocessor defines means you can only demangle when the target binary matches the host binary. That doesn't seem like a feasible approach.
The patch here is following existing practice in the codebase. The targets binary versus host binary issue already exists with respect to FloatData<long double>.
I'm not very familiar with this code base. However, I am somewhat confused by your proposed change to "parseExprPrimary". In particular, where you now parse 'e' literals as "double" on powerpc/s390x, and 'g' literals as "long double" everywhere. This seems incorrect to me.
'e' literals really should be of "long double" *type* always. It's just that on powerpc and s390x, in an old ABI selected via -mlong-double-64, the "long double" type was implemented as IEEE-64 (just like "double", but the type is nevertheless still "long double", not "double").
'g' literals on the other hand really should be of type "long double" only on powerpc and s390x; on some other platforms, in particular x86, they should be of type "__float128" (on yet other platforms, 'g' is not used at all).
But what confuses me even more is how this whole routine is even supposed to work in non-native mode: it just uses the native "double" or "long double" types, but the host implementation of those may be different from the one active on the target (whether this is because of cross-compilation to another target, or simply to another ABI mode like -mlong-double-64 vs. -mlong-double-128). Is this routine only ever to be called natively?
Hi @uweigand, Thanks for your comments. Please see my explanations below.
In mangled names, floating-point literals are encoded using a fixed-length lowercase hexadecimal string corresponding to the internal representation, high-order bytes first. For example, float literal -1.0f is encoded as "fbf800000". For a 64-bit long double literal on powerpc and s390x, the encoded form is type code 'e' followed by 16 hexadecimal digits. For a 128-bit long double literal on powerpc and s390x, the encoded form is type code 'g' followed by 32 hexadecimal digits. So, the proposed the change allows the parser to treat type code 'e' as a double (64-bit) and take the following 16 hexadecimal digits as the internal representation of the literal, instead of treating it as a 128-bit long double and looking for 32 hexadecimal digits after it. When the type code is 'g', the parser will be looking for 32 hexadecimal digits. These are changes for parsing literals in the mangled names.
'e' literals really should be of "long double" *type* always. It's just that on powerpc and s390x, in an old ABI selected via -mlong-double-64, the "long double" type was implemented as IEEE-64 (just like "double", but the type is nevertheless still "long double", not "double").
When printing out the demangled names, it still prints out "long double" for type code 'e' as usual, and "long double" for type code 'g' for powerpc and s390x (see lines 3789-3801 of the code).
'g' literals on the other hand really should be of type "long double" only on powerpc and s390x; on some other platforms, in particular x86, they should be of type "__float128" (on yet other platforms, 'g' is not used at all).
Right, please see lines 3793-3801 of the proposed change.
But what confuses me even more is how this whole routine is even supposed to work in non-native mode: it just uses the native "double" or "long double" types, but the host implementation of those may be different from the one active on the target (whether this is because of cross-compilation to another target, or simply to another ABI mode like -mlong-double-64 vs. -mlong-double-128). Is this routine only ever to be called natively?
__cxa_demangle() is a routine in runtime library libc++abi that is only built natively.
So I guess the point I was missing is that "treat as double" / "treat as long double" here simply means to decode the byte sequence as if it were a native value of that type (in default compilation mode). Then the change for powerpc and s390x does indeed look correct to me.
'e' literals really should be of "long double" *type* always. It's just that on powerpc and s390x, in an old ABI selected via -mlong-double-64, the "long double" type was implemented as IEEE-64 (just like "double", but the type is nevertheless still "long double", not "double").
When printing out the demangled names, it still prints out "long double" for type code 'e' as usual, and "long double" for type code 'g' for powerpc and s390x (see lines 3789-3801 of the code).
OK.
'g' literals on the other hand really should be of type "long double" only on powerpc and s390x; on some other platforms, in particular x86, they should be of type "__float128" (on yet other platforms, 'g' is not used at all).
Right, please see lines 3793-3801 of the proposed change.
I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?
On the other hand, if 'g' encoded literals can never happen on Intel (or other platforms), maybe it would be better to have the code handling 'g' within a #ifdef section only active on powerpc and s390?
But what confuses me even more is how this whole routine is even supposed to work in non-native mode: it just uses the native "double" or "long double" types, but the host implementation of those may be different from the one active on the target (whether this is because of cross-compilation to another target, or simply to another ABI mode like -mlong-double-64 vs. -mlong-double-128). Is this routine only ever to be called natively?
__cxa_demangle() is a routine in runtime library libc++abi that is only built natively.
OK, got it. Even more so, I guess we must also ensure that it is only built using the default compiler setting (e.g. it is built with the default -mlong-double-128 on s390x, not with -mlong-double-64). But that's probably a reasonable assumption.
A literal such as 1.Q would appear as g3fff0000000000000000000000000000. As would 1.L with -mlong-double-128.
__cxa_demangle() is a routine in runtime library libc++abi that is only built natively.
OK, got it. Even more so, I guess we must also ensure that it is only built using the default compiler setting (e.g. it is built with the default -mlong-double-128 on s390x, not with -mlong-double-64). But that's probably a reasonable assumption.
The patch already adds a preprocessor check for that on Power. I've added an inline comment to do the same for s390x.
libcxxabi/src/cxa_demangle.cpp | ||
---|---|---|
11 | A similar error for building with 64-bit long double on s390x would be appropriate. |
I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?
On the other hand, if 'g' encoded literals can never happen on Intel (or other platforms), maybe it would be better to have the code handling 'g' within a #ifdef section only active on powerpc and s390?
For X86, 'e' is used for 80-bit long double and 'g' is used for 128-bit long double. The following is the code in Clang.
clang/lib/Basic/Targets/X86.h .... const char *getLongDoubleMangling() const override { return LongDoubleFormat == &llvm::APFloat::IEEEquad() ? "g" : "e"; } ...
But then this patch must be incorrect, given that it does
return getDerived().template parseFloatingLiteral<long double>();
for both the 'e' and 'g' cases on Intel. Now I guess it depends on how the file is compiled (with -mlong-double-80 or -mlong-double-128), but either way, one of the cases will be handled incorrectly.
Agreed. The addition of the 'g' handling for cases other than the specific platforms mentioned by the title for the patch is outside the intended scope and should probably be avoided.
Addressed comments:
- Added guard on S390 to only build in 128-bit long double mode.
- Only recognize 'g' for parsing for powerpc and s390.
It seems demangling g as __float128 (and not trying to interpret its value) is to be expected. Setting up e to handle 64-bit long double should be sufficient. I suggest to remove changing the handling of g from this patch.
libcxxabi/src/cxa_demangle.cpp | ||
---|---|---|
14 | None of these restrictions are necessary if we stick to the established handling of g, which prints the raw byte string in hex. Yes, it is unfortunate that it displays as __float128 on Power, especially since the GCC compiler's __float128 on Power (requiring -mfloat128) mangles as something else. Dealing with that is another can of worms that might need coordination with the GCC folks. | |
libcxxabi/src/demangle/ItaniumDemangle.h | ||
3793 | I do not believe this change is necessary at this time. On powerpc64le-linux-gnu even with -mlong-double-128, Similarly, the same with | |
4236 | I'm going to suggest leaving g alone. | |
5185 | If we aren't changing g to go here, then we don't need the change here. | |
5192 | This is not going to be long enough for IBM double-double, which can have a large gap between the high and the low doubles. |
Addressed comments:
- Leave the existing handling of type code g unchanged as suggested.
LGTM with a minor comment.
libcxxabi/src/demangle/ItaniumDemangle.h | ||
---|---|---|
4244 | Suggestion for a comment: |
Please also commit this to llvm's copy of the demangler. You can copy the change over by running the cp-to-llvm.sh script.
As information, this will block building the demangler on AIX with the xlclang++ invocation of the IBM XL compiler.