Page MenuHomePhabricator

[CGBuiltin] Respect asm labels and redefine_extname for builtins with specialized emitting
ClosedPublic

Authored by MaskRay on Oct 1 2020, 4:54 PM.

Details

Summary

rL131311 added asm() support for builtin functions, but asm() for builtins with
specialized emitting (e.g. memcpy, various math functions) still do not work.

This patch makes these functions work for asm() and #pragma redefine_extname.
glibc uses asm() to redirect internal libc function calls to hidden aliases.

Limitation: such a function is a builtin in clang, but will not be recognized as
a libcall in optimization passes because Clang does not annotate the renamed
function as a libcall. In GCC -O1 or above, abs can be optimized out but we can't.
Additionally, we cannot redirect __builtin_sin to real_sin in the following example:

double sin(double x) asm("real_sin");
double f(double d) { return __builtin_sin(d); }

According to @rsmith, the following three statements cannot be simultaneously true:

(1) The frontend function foo has known, builtin semantics X.
(2) The symbol foo has known, builtin semantics X.
(3) It's not correct to lower a call to the frontend function foo to the symbol foo.

People do want to keep (1) (if it is profitable to expand a memcpy, do it).
This also means that people do not want to add -fno-builtin-memcpy.
People do want (3): that is why they use asm("__GI_memcpy") in the first place.

So unfortunately we make a compromise by not refuting (2) (see the limitation above).
For most libcalls, there is a small loss because compilers don't synthesize them.
For the few glibc cares about, it uses asm("memcpy = __GI_memcpy"); to make
the assembly level redirection.
(Changing function names (e.g. __memcpy) is a hit to ergonomics which is not acceptable).

Diff Detail

Event Timeline

MaskRay created this revision.Oct 1 2020, 4:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2020, 4:54 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MaskRay requested review of this revision.Oct 1 2020, 4:54 PM
MaskRay updated this revision to Diff 295696.Oct 1 2020, 5:02 PM

Improve tests

What are the expected semantics in this case? Is this:

  • the function is still the builtin, but if it ends up being a libcall, call a function with a different asm name, or
  • the function is not considered to be the builtin any more, or
  • something else?

I think this patch is approximately the first thing, but it's also cutting off emission for cases where we wouldn't emit a libcall. Should we make that distinction?

MaskRay updated this revision to Diff 297667.Oct 12 2020, 12:40 PM
MaskRay edited the summary of this revision. (Show Details)

Mention the limitation

What are the expected semantics in this case? Is this:

  • the function is still the builtin, but if it ends up being a libcall, call a function with a different asm name, or
  • the function is not considered to be the builtin any more, or
  • something else?

I think this patch is approximately the first thing, but it's also cutting off emission for cases where we wouldn't emit a libcall. Should we make that distinction?

Yes, we do the first one. I mentioned the limitation in the description. This happens with some functions like abs which clang has customized emit without using a library call.

MaskRay edited the summary of this revision. (Show Details)Oct 12 2020, 12:52 PM
MaskRay edited the summary of this revision. (Show Details)Oct 12 2020, 1:23 PM

What are the expected semantics in this case? Is this:

  • the function is still the builtin, but if it ends up being a libcall, call a function with a different asm name, or
  • the function is not considered to be the builtin any more, or
  • something else?

I think this patch is approximately the first thing, but it's also cutting off emission for cases where we wouldn't emit a libcall. Should we make that distinction?

Yes, we do the first one. I mentioned the limitation in the description. This happens with some functions like abs which clang has customized emit without using a library call.

What should happen for a case where the LLVM mid-level optimizers insert a libcall (for example, inventing a call to memcpy), and memcpy was renamed at the source level to another name? Do we still call the original name?

Generally I wonder if we should instead be renaming the function in LLVM's TargetLibraryInfo, rather than getting the frontend to try to set up a situation where LLVM is unlikely to generate a call to the "wrong" name for the builtin.

MaskRay added a comment.EditedOct 12 2020, 3:39 PM

What are the expected semantics in this case? Is this:

  • the function is still the builtin, but if it ends up being a libcall, call a function with a different asm name, or
  • the function is not considered to be the builtin any more, or
  • something else?

I think this patch is approximately the first thing, but it's also cutting off emission for cases where we wouldn't emit a libcall. Should we make that distinction?

Yes, we do the first one. I mentioned the limitation in the description. This happens with some functions like abs which clang has customized emit without using a library call.

What should happen for a case where the LLVM mid-level optimizers insert a libcall (for example, inventing a call to memcpy), and memcpy was renamed at the source level to another name? Do we still call the original name?

In that case, I believe GCC does not handle it as well. glibc's approach is to ... let GNU as handle it (D88625 ports the behavior to MC). If a GCC pass somehow emits a memcpy call, the memcpy = __GI_memcpy; inline assembly can redirect the function call to __GI_memcpy.

Generally I wonder if we should instead be renaming the function in LLVM's TargetLibraryInfo, rather than getting the frontend to try to set up a situation where LLVM is unlikely to generate a call to the "wrong" name for the builtin.

Ideally, probably yes. Then clang does asm mangling for no-builtin functions and LLVM does asm manging for builtin functions... There needs to be a new IR level feature. There is complexity in the overloaded builtins...

MaskRay added a comment.EditedOct 14 2020, 11:04 AM

In case my previous comment is not clear: we can do renaming in LLVM, but the benefit is small (for a few libcalls (only some really simple libcalls) with custom code emitting, if they have asm(...), they are now optimizable (these users should probably use __builtin_* in the first place)). We will require a renaming infrastructure in LLVM IR, which is a large undertaking. The patch as is is the most practical approach.

I worry that we're chasing after the implementation details of GCC here. It seems self-contradictory to say all three of these things at once:

  1. The frontend function foo has known, builtin semantics X. (Eg, we'll use those semantics in the frontend.)
  2. The symbol foo has known, builtin semantics X. (Eg, we'll synthesize calls to foo from the middle-end.)
  3. It's not correct to lower a call to the frontend function foo to the symbol foo.

So, from a principled standpoint, which of the above do we not consider to be correct in this situation?

  • If we don't consider (1) to be correct, then we need to stop treating foo as a builtin everywhere -- we shouldn't be constant-evaluating calls to it, in particular. That's problematic in principle, because the asm label might be added after we've already seen the first declaration and added a BuiltinAttr to it. If we want to go in this direction, I think we should require a build that attaches an asm label to a builtin to also pass -fno-builtin-foo rather than trying to un-builtin a builtin function after the fact.
  • If we don't consider (2) to be correct, then we need to stop the middle-end from inserting calls to the symbol foo, and get it to use the new symbol instead. That'd be the "teach the target libcall info about this" approach, which (as you point out) would be quite complex.
  • The status quo is that we don't consider (3) to be correct. If we take this path, I suppose we could say that we make a best effort to use the renamed symbol, but provide no guarantees. That seems unprincipled and will likely continue to cause problems down the line, but maybe this gets us closest to the observed GCC behavior?
joerg added a comment.Oct 15 2020, 1:28 PM

Disabling builtin handling when asm is enabled would be a major annoyance for NetBSD. The interaction between symbol renaming and TLI is complicated. There are cases where it would make perfect sense and others were it would be harmful. Example of the latter is the way SSP is implemented by inline functions that call a prototype renamed back to the original symbol name.

I worry that we're chasing after the implementation details of GCC here. It seems self-contradictory to say all three of these things at once:

  1. The frontend function foo has known, builtin semantics X. (Eg, we'll use those semantics in the frontend.)
  2. The symbol foo has known, builtin semantics X. (Eg, we'll synthesize calls to foo from the middle-end.)
  3. It's not correct to lower a call to the frontend function foo to the symbol foo.

So, from a principled standpoint, which of the above do we not consider to be correct in this situation?

  • If we don't consider (1) to be correct, then we need to stop treating foo as a builtin everywhere -- we shouldn't be constant-evaluating calls to it, in particular. That's problematic in principle, because the asm label might be added after we've already seen the first declaration and added a BuiltinAttr to it. If we want to go in this direction, I think we should require a build that attaches an asm label to a builtin to also pass -fno-builtin-foo rather than trying to un-builtin a builtin function after the fact.
  • If we don't consider (2) to be correct, then we need to stop the middle-end from inserting calls to the symbol foo, and get it to use the new symbol instead. That'd be the "teach the target libcall info about this" approach, which (as you point out) would be quite complex.
  • The status quo is that we don't consider (3) to be correct. If we take this path, I suppose we could say that we make a best effort to use the renamed symbol, but provide no guarantees. That seems unprincipled and will likely continue to cause problems down the line, but maybe this gets us closest to the observed GCC behavior?

It took me quite a while to understand the "take 2 out of the 3 guarantees" theorem;-)

I think people do want to keep (1) (if it is profitable to expand a memcpy, do it). This also means that people do not want to add -fno-builtin-memcpy.

People do want (3): that is why they use an asm("__GI_memcpy") in the first place.
The status quo is "we don't consider (3) to be correct." -> "we ignore asm("__GI_memcpy")". This makes the system consistent but it is unexpected.

So unfortunately we are making a compromise on (2): refuting it is difficult in both GCC and Clang.
For most libcalls, compilers don't generate them, so it is a small loss. For the few glibc cares about, it uses the asm("memcpy = __GI_memcpy"); GNU as feature to make the second level redirection.
D88625 will do it on MC side.

We still have another choice: don't use memcpy, use __memcpy instead. However, that will be a hard hammer and according to @zatrazz people will complain 'with gcc this works as intended'...

rsmith accepted this revision.Oct 15 2020, 2:32 PM

If the answer is that we consider (2) to be incorrect, but that this is only a partial step in that direction, I think that's fine -- that's enough that we can know where the bug is if / when people complain that we still generate calls to memcpy even when we used an asm label to indicate the library symbol was renamed to something else. In that light, this is really a workaround rather than a fix: we can't easily stop the middle-end from synthesizing calls to library functions with the wrong symbol, because we don't have a good way to tell it those library functions were renamed, but we can at least reduce the rate of incidence by avoiding generating the intrinsic calls that are likely to get lowered to library functions.

Given that understanding, should this:

double sin(double x) asm("real_sin");
double f(double d) { return __builtin_sin(d); }

... call the sin symbol or the real_sin symbol? I think with the current patch it'll call the sin symbol, but calling real_sin seems correct.

In any case, assuming the above understanding is right, this looks like a step in the right direction.

This revision is now accepted and ready to land.Oct 15 2020, 2:32 PM
MaskRay updated this revision to Diff 298480.Oct 15 2020, 3:03 PM
MaskRay edited the summary of this revision. (Show Details)

Add an example about inconsistency. Elaborate on the limitation.

If the answer is that we consider (2) to be incorrect, but that this is only a partial step in that direction, I think that's fine -- that's enough that we can know where the bug is if / when people complain that we still generate calls to memcpy even when we used an asm label to indicate the library symbol was renamed to something else. In that light, this is really a workaround rather than a fix: we can't easily stop the middle-end from synthesizing calls to library functions with the wrong symbol, because we don't have a good way to tell it those library functions were renamed, but we can at least reduce the rate of incidence by avoiding generating the intrinsic calls that are likely to get lowered to library functions.

Given that understanding, should this:

double sin(double x) asm("real_sin");
double f(double d) { return __builtin_sin(d); }

... call the sin symbol or the real_sin symbol? I think with the current patch it'll call the sin symbol, but calling real_sin seems correct.

In any case, assuming the above understanding is right, this looks like a step in the right direction.

Thanks for the review! For this example, GCC emits real_sin but we fail to do so. Added it to the test.

MaskRay edited the summary of this revision. (Show Details)Oct 15 2020, 3:10 PM
This revision was landed with ongoing or failed builds.Oct 15 2020, 3:14 PM
This revision was automatically updated to reflect the committed changes.

Hi,

I am on Ubuntu 18 machine and it has finite math header <bits/math-finite.h>.
This header is included by the glibc 2.27. This header has this following definition.
extern double log (double) asm ("" "log_finite") attribute__ ((nothrow ));

Consider the following test case

#include <math.h>
double mylog (double d) {

return log(d);

}

Before this patch clang generates -O2 -ffast-math
--Snip--
; Function Attrs: nounwind readnone uwtable
define dso_local double @mlog(double %d) local_unnamed_addr #0 {
entry:

%0 = tail call fast double @llvm.log.f64(double %d)
ret double %0

}
---Snip--

After this patch on machines with lesser glibc versions, for -O2 -ffast-math, clang generates

--Snip--
; Function Attrs: nounwind readnone uwtable
define dso_local double @mlog(double %d) local_unnamed_addr #0 {
entry:

%call = tail call fast double @__log_finite(double %d) #2
ret double %call

}
--Snip--

Note on latest Ubuntu 20.04.1 LTS with Glibc 2.31 <bits/math-finite.h> header is not present
It is removed by glibc. so there I see the @llvm.log.f64 intrinsic calls.

So on machines with lesser glibc versions where the header file is present this will be a problem.
Note the LLVM "opt" does optimizations like vectorization only on the intrinsic, but ignores the calls to @__log_finite.

Example https://llvm.godbolt.org/z/765We8

I am not sure if this known issue.

It is causing some performance issues with some benchmarks on my ubuntu 18 machine.

Shall I file a PR for this ?

For your example:

extern double log (double) asm ("" "log_finite") __attribute__ ((nothrow));

double mylog (double d) { return log(d); }

The intention is to emit log_finite, no matter the -ffast-math. Both GCC and Clang (after this patch) emit jmp log_finite.

The previous behavior (according to your comment, with an older glibc) was undesired. So I think the right suggestion is to upgrade glibc.

For your example:

extern double log (double) asm ("" "log_finite") __attribute__ ((nothrow));

double mylog (double d) { return log(d); }

The intention is to emit log_finite, no matter the -ffast-math. Both GCC and Clang (after this patch) emit jmp log_finite.

The previous behavior (according to your comment, with an older glibc) was undesired. So I think the right suggestion is to upgrade glibc.

Then there optimizations like vectorization, inst combine which works on the LLVM intrinsics. But they will be not happening now with log_finite calls.
is it possible to add an option to disable this feature, in case anyone who wants to avail these optimization?

For your example:

extern double log (double) asm ("" "log_finite") __attribute__ ((nothrow));

double mylog (double d) { return log(d); }

The intention is to emit log_finite, no matter the -ffast-math. Both GCC and Clang (after this patch) emit jmp log_finite.

The previous behavior (according to your comment, with an older glibc) was undesired. So I think the right suggestion is to upgrade glibc.

Then there optimizations like vectorization, inst combine which works on the LLVM intrinsics. But they will be not happening now with log_finite calls.
is it possible to add an option to disable this feature, in case anyone who wants to avail these optimization?

I lean toward not adding an option since newer glibc does not have the problem (and the new Clang behavior is consistent with GCC).

For your example:

extern double log (double) asm ("" "log_finite") __attribute__ ((nothrow));

double mylog (double d) { return log(d); }

The intention is to emit log_finite, no matter the -ffast-math. Both GCC and Clang (after this patch) emit jmp log_finite.

The previous behavior (according to your comment, with an older glibc) was undesired. So I think the right suggestion is to upgrade glibc.

Then there optimizations like vectorization, inst combine which works on the LLVM intrinsics. But they will be not happening now with log_finite calls.

I'm not sure about the expected semantics/lowering for the finite calls, but can you add something under LibCallSimplifier::optimizeFloatingPointLibCall() that would turn it into the LLVM log intrinsic with appropriate FMF? Codegen would need to be responsible for converting it back to finite call(s) if those are available?

For your example:

extern double log (double) asm ("" "log_finite") __attribute__ ((nothrow));

double mylog (double d) { return log(d); }

The intention is to emit log_finite, no matter the -ffast-math. Both GCC and Clang (after this patch) emit jmp log_finite.

The previous behavior (according to your comment, with an older glibc) was undesired. So I think the right suggestion is to upgrade glibc.

Then there optimizations like vectorization, inst combine which works on the LLVM intrinsics. But they will be not happening now with log_finite calls.

I'm not sure about the expected semantics/lowering for the finite calls, but can you add something under LibCallSimplifier::optimizeFloatingPointLibCall() that would turn it into the LLVM log intrinsic with appropriate FMF? Codegen would need to be responsible for converting it back to finite call(s) if those are available?

Hi Sanjay I thought codegen no longer lowers them back to finite calls
https://reviews.llvm.org/rGcd0926d087a85c5ee1222ca80980b4440214a822

For your example:

extern double log (double) asm ("" "log_finite") __attribute__ ((nothrow));

double mylog (double d) { return log(d); }

The intention is to emit log_finite, no matter the -ffast-math. Both GCC and Clang (after this patch) emit jmp log_finite.

The previous behavior (according to your comment, with an older glibc) was undesired. So I think the right suggestion is to upgrade glibc.

Then there optimizations like vectorization, inst combine which works on the LLVM intrinsics. But they will be not happening now with log_finite calls.

I'm not sure about the expected semantics/lowering for the finite calls, but can you add something under LibCallSimplifier::optimizeFloatingPointLibCall() that would turn it into the LLVM log intrinsic with appropriate FMF? Codegen would need to be responsible for converting it back to finite call(s) if those are available?

Hi Sanjay I thought codegen no longer lowers them back to finite calls
https://reviews.llvm.org/rGcd0926d087a85c5ee1222ca80980b4440214a822

There was a comment in D74712 that we might be moving too fast. If you would like to revert/adjust that patch, raise a bug or post a patch to discuss? If the goal is to be able to vectorize the finite calls, then we will need some way to represent those. Alternatively, we could change something in the cost models to enable more unrolling, etc?

For your example:

extern double log (double) asm ("" "log_finite") __attribute__ ((nothrow));

double mylog (double d) { return log(d); }

The intention is to emit log_finite, no matter the -ffast-math. Both GCC and Clang (after this patch) emit jmp log_finite.

The previous behavior (according to your comment, with an older glibc) was undesired. So I think the right suggestion is to upgrade glibc.

Then there optimizations like vectorization, inst combine which works on the LLVM intrinsics. But they will be not happening now with log_finite calls.

I'm not sure about the expected semantics/lowering for the finite calls, but can you add something under LibCallSimplifier::optimizeFloatingPointLibCall() that would turn it into the LLVM log intrinsic with appropriate FMF? Codegen would need to be responsible for converting it back to finite call(s) if those are available?

Hi Sanjay I thought codegen no longer lowers them back to finite calls
https://reviews.llvm.org/rGcd0926d087a85c5ee1222ca80980b4440214a822

There was a comment in D74712 that we might be moving too fast. If you would like to revert/adjust that patch, raise a bug or post a patch to discuss? If the goal is to be able to vectorize the finite calls, then we will need some way to represent those. Alternatively, we could change something in the cost models to enable more unrolling, etc?

The goal is able to vectorize the finite call. On a newer glibc 2.31 same test case would undergo all the optimizations. As you said we can try to lower in to intrinsic in LibCallSimplifier and then generate the finite calls in code generator again.

I will raise a bug as of now.

Not sure if this is related, but on SPARC, stage2 builds recently started to fail with:

[379/5552] Building CXX object projects/compiler-rt/lib/ubsan/CMakeFiles/RTUbsan.sparc.dir/ubsan_diag.cpp.o
FAILED: projects/compiler-rt/lib/ubsan/CMakeFiles/RTUbsan.sparc.dir/ubsan_diag.cpp.o
/home/glaubitz/llvm-project/build/./bin/clang++ -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iprojects/compiler-rt/lib/ubsan -I/home/glaubitz/llvm-project/compiler-rt/lib/ubsan -Iinclude -I/home/glaubitz/llvm-project/llvm/include -I/home/glaubitz/llvm-project/compiler-rt/lib/ubsan/.. -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -fdiagnostics-color -Wall -std=c++14 -Wno-unused-parameter -g  -m32 -fPIC -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables -fno-stack-protector -fno-sanitize=safe-stack -fvisibility=hidden -fno-lto -O3 -gline-tables-only -Wno-gnu -Wno-variadic-macros -Wno-c99-extensions -nostdinc++ -fno-rtti -DUBSAN_CAN_USE_CXXABI -std=c++14 -MD -MT projects/compiler-rt/lib/ubsan/CMakeFiles/RTUbsan.sparc.dir/ubsan_diag.cpp.o -MF projects/compiler-rt/lib/ubsan/CMakeFiles/RTUbsan.sparc.dir/ubsan_diag.cpp.o.d -o projects/compiler-rt/lib/ubsan/CMakeFiles/RTUbsan.sparc.dir/ubsan_diag.cpp.o -c /home/glaubitz/llvm-project/compiler-rt/lib/ubsan/ubsan_diag.cpp
In file included from /home/glaubitz/llvm-project/compiler-rt/lib/ubsan/ubsan_diag.cpp:25:
In file included from /usr/include/stdio.h:870:
/usr/include/bits/stdio-ldbl.h:26:20: error: cannot apply asm label to function after its first use
__LDBL_REDIR_DECL (vfprintf)
~~~~~~~~~~~~~~~~~~~^~~~~~~~~
/usr/include/sys/cdefs.h:467:26: note: expanded from macro '__LDBL_REDIR_DECL'
  extern __typeof (name) name __asm (__ASMNAME ("__nldbl_" #name));
                         ^           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
[412/5552] Building CXX object lib/MC/MCParser/CMakeFiles/LLVMMCParser.dir/MasmParser.cpp.o
ninja: build stopped: subcommand failed.

See: http://lab.llvm.org:8014/#/builders/134