rL82131 changed -O from -O1 to -O2, because -O1 was not different from
-O2 at that time.
GCC treats -O as -O1 and there is now work to make -O1 meaningful.
We can change -O back to -O1 again.
Paths
| Differential D79916
Map -O to -O1 instead of -O2 ClosedPublic Authored by MaskRay on May 13 2020, 5:32 PM.
Details Summary rL82131 changed -O from -O1 to -O2, because -O1 was not different from GCC treats -O as -O1 and there is now work to make -O1 meaningful.
Diff Detail
Event TimelineMaskRay added a child revision: D79919: [Driver] Pass -plugin-opt=O2 for -Os -Oz and -plugin-opt=O1 for -Og.May 13 2020, 5:53 PM Comment Actions Actually it was http://llvm.org/r82131 that mapped -O to -O2 (I just refactored it). Originally it seems it was mapped to -O1. Because this seems to have done quite intentionally, I'm a little vary of changing it back. I'm not sure who would be a good person to have insights here though. +echristo maybe? Herald added subscribers: dexonsmith, steven_wu, hiraditya. · View Herald TranscriptMay 14 2020, 10:49 AM Comment Actions I'm totally down, but you knew that already :) Duncan: Do you have any concerns? I doubt it, but just checking. This revision is now accepted and ready to land.May 15 2020, 4:57 PM Comment Actions
Xcode doesn't use -O. There could be some internal users, but I doubt it, and we can probably migrate them if this causes a problem. @arphaman, WDYT? @Gerolf, I don't imagine you have any concerns, but thought I should double-check. Comment Actions
LGTM Comment Actions
Gerolf told me he has no concerns. Comment Actions
Thanks! (I was just about to ask whether you have a quick channel to Gerolf :) I'll re-test and commit. Closed by commit rG82904401e327: Map -O to -O1 instead of -O2 (authored by MaskRay). · Explain WhyMay 18 2020, 4:17 PM This revision was automatically updated to reflect the committed changes. Comment Actions This has significantly regressed FreeBSD's performance with the new version of Clang. It seems Clang does not inline functions at -O1, unlike GCC, and since FreeBSD currently compiles its kernel with -O whenever debug symbols are enabled[1] (which, of course, is almost always true), this results in all its static inline helper functions not being inlined at all, a pattern that is common in the kernel, used for things like get_curthread and the atomics implementations. [1] This is a dubious decision made in r140400 in 2005 to provide "truer debugger stack traces" (well, before then there was ping-ponging between -O and -O2 based on concerns around correctness vs performance, but amd64 is an exception that has always used -O2 since r127180 it seems). Given that GCC will inline at -O, at least these days, the motivation seems to no longer exist, and compiling a kernel at anything other than -O2 (or maybe -O3) seems like a silly thing to do, but nevertheless it's what is currently done. Comment Actions
This is actually SUCH a bad idea that a kernel built with -O will *not work at all* on 32 bit powerpc platforms (presumably due to allocating stack frames in the middle of assembly fragments in the memory management that are supposed to be inlined at all times.) I had to hack kern.pre.mk to request -O2 at all times when building powerpc/powerpcspe just to get a functioning kernel. Comment Actions
Well, -O0, -O1, -O2 and -O should all produce working kernels, and any cases where they don't are compiler bugs (or kernel bugs if they rely on UB) that should be fixed, not worked around by tweaking the compiler flags in a fragile way until you get the behaviour relied on. Correctness and performance are very different issues here. Comment Actions
As an example: static __inline void mtsrin(vm_offset_t va, register_t value) { __asm __volatile ("mtsrin %0,%1; isync" :: "r"(value), "r"(va)); } This code is used in the mmu when bootstrapping the cpu like so: for (i = 0; i < 16; i++) mtsrin(i << ADDR_SR_SHFT, kernel_pmap->pm_sr[i]); powerpc_sync(); sdr = (u_int)moea_pteg_table | (moea_pteg_mask >> 10); __asm __volatile("mtsdr1 %0" :: "r"(sdr)); isync(); tlbia(); During the loop there, we are in the middle of programming the MMU segment registers in real mode, and is supposed to be doing all work out of registers. (and powerpc_sync() and isync() should be expanded to their single assembly instruction, not a function call. The whole point of calling those is that we are in an inconsistent hardware state and need to sync up before continuing execution) If there isn't a way to force inlining, we will have to change to using preprocessor macros in cpufunc.h. Comment Actions
There is, it's called __attribute__((always_inline)) and supported by both GCC and Clang. But at -O0 you'll still have register allocation to deal with, so really that code is just fundamentally broken and should not be written in C. There is no way for you to guarantee stack spills are not used, it's way out of scope for C. Comment Actions
Actually, this is probably a bad example. Since we're in real mode it doesn't really matter. But I can see other places where powerpc_sync() / isync() are dangerous to expand to a function call. Comment Actions
Is there a way to have always_inline and unused at the same time? I tried using always_inline and it caused warnings in things that used *parts* of cpufunc.h. Comment Actions
Both __attribute__((always_inline)) __attribute__((unused)) and __attribute__((always_inline, unused)) work, but really you should use __always_inline __unused in FreeBSD (which will expand to the former). Comment Actions
But also you really should not get warnings for unused static functions in included headers, only ones defined in the C source file itself. We'd have countless warnings in the kernel across all architectures otherwise. Comment Actions
I agree. But that's what it is doing when using always_inline in combination with -Wunused-function. There is currently no real usage of always_inline in system headers though, so maybe I'm just the first to complain about it? Comment Actions
We use them in CheriBSD and have no such issues that I've ever noticed. When was the last time you checked (and what compiler)? Comment Actions
Five minutes ago, FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-rc2-0-g414f32a9e86) Using always_inline unused appears to work to silence the warnings, however, so that is probably workable. Comment Actions Several previous comments are FreeBSD specific. To we clang developers, the concrete request is
right? I think this makes sense, especially when inline is explicitly specified... This appears to be related to some -O1 work @echristo is working on. // gcc -O1 and g++ -O1 inline `foo`. Note that in C99 mode, `extern int foo` is needed to ask the compiler to provide an external definition. // clang -O1 and clang++ -O1 do not inline `foo` inline int foo(int a) { return a + a; } int bar(int a, int b) { return foo(a + b); } Comment Actions
Yes, inline should certainly be inlined, but also non-inline things should too. Perhaps not so aggressively, but there's no good reason not to, really, it can be a big win with a simple transformation. GCC seems to inline even those: # echo 'static void foo(void) { __asm__ __volatile__ ("#asdf"); } void bar(void) { foo(); } void baz(void) { foo(); foo(); }' | gcc -x c - -o - -S -O1 .file "" .text .globl bar .type bar, @function bar: .LFB1: .cfi_startproc #APP # 1 "<stdin>" 1 #asdf # 0 "" 2 #NO_APP ret .cfi_endproc .LFE1: .size bar, .-bar .globl baz .type baz, @function baz: .LFB2: .cfi_startproc #APP # 1 "<stdin>" 1 #asdf # 0 "" 2 # 1 "<stdin>" 1 #asdf # 0 "" 2 #NO_APP ret .cfi_endproc .LFE2: .size baz, .-baz .ident "GCC: (Debian 10.2.0-8) 10.2.0" .section .note.GNU-stack,"",@progbits
Revision Contents
Diff 264749 clang/include/clang/Driver/Options.td
clang/lib/Frontend/CompilerInvocation.cpp
clang/test/CodeGen/builtins-systemz-zvector-constrained.c
clang/test/CodeGen/builtins-systemz-zvector.c
clang/test/CodeGen/builtins-systemz-zvector2-constrained.c
clang/test/CodeGen/builtins-systemz-zvector2.c
clang/test/CodeGen/builtins-systemz-zvector3-constrained.c
clang/test/CodeGen/builtins-systemz-zvector3.c
clang/test/CodeGen/fma-builtins-constrained.c
clang/test/Driver/O.c
clang/test/Driver/clang_f_opts.c
clang/test/Driver/lto.c
|