rL82131 changed -O from -O1 to -O2, because -O1 was not different from
-O2 at that time.
GCC treats -O as -O1 and there is now work to make -O1 meaningful.
We can change -O back to -O1 again.
Differential D79916
Map -O to -O1 instead of -O2 MaskRay on May 13 2020, 5:32 PM. Authored by
Details rL82131 changed -O from -O1 to -O2, because -O1 was not different from GCC treats -O as -O1 and there is now work to make -O1 meaningful.
Diff Detail
Unit Tests Event TimelineComment Actions Actually it was http://llvm.org/r82131 that mapped -O to -O2 (I just refactored it). Originally it seems it was mapped to -O1. Because this seems to have done quite intentionally, I'm a little vary of changing it back. I'm not sure who would be a good person to have insights here though. +echristo maybe? Comment Actions I'm totally down, but you knew that already :) Duncan: Do you have any concerns? I doubt it, but just checking. Comment Actions Thanks! (I was just about to ask whether you have a quick channel to Gerolf :) I'll re-test and commit. Comment Actions This has significantly regressed FreeBSD's performance with the new version of Clang. It seems Clang does not inline functions at -O1, unlike GCC, and since FreeBSD currently compiles its kernel with -O whenever debug symbols are enabled[1] (which, of course, is almost always true), this results in all its static inline helper functions not being inlined at all, a pattern that is common in the kernel, used for things like get_curthread and the atomics implementations. [1] This is a dubious decision made in r140400 in 2005 to provide "truer debugger stack traces" (well, before then there was ping-ponging between -O and -O2 based on concerns around correctness vs performance, but amd64 is an exception that has always used -O2 since r127180 it seems). Given that GCC will inline at -O, at least these days, the motivation seems to no longer exist, and compiling a kernel at anything other than -O2 (or maybe -O3) seems like a silly thing to do, but nevertheless it's what is currently done. Comment Actions This is actually SUCH a bad idea that a kernel built with -O will *not work at all* on 32 bit powerpc platforms (presumably due to allocating stack frames in the middle of assembly fragments in the memory management that are supposed to be inlined at all times.) I had to hack kern.pre.mk to request -O2 at all times when building powerpc/powerpcspe just to get a functioning kernel. Comment Actions Well, -O0, -O1, -O2 and -O should all produce working kernels, and any cases where they don't are compiler bugs (or kernel bugs if they rely on UB) that should be fixed, not worked around by tweaking the compiler flags in a fragile way until you get the behaviour relied on. Correctness and performance are very different issues here. Comment Actions As an example: static __inline void mtsrin(vm_offset_t va, register_t value) { __asm __volatile ("mtsrin %0,%1; isync" :: "r"(value), "r"(va)); } This code is used in the mmu when bootstrapping the cpu like so: for (i = 0; i < 16; i++) mtsrin(i << ADDR_SR_SHFT, kernel_pmap->pm_sr[i]); powerpc_sync(); sdr = (u_int)moea_pteg_table | (moea_pteg_mask >> 10); __asm __volatile("mtsdr1 %0" :: "r"(sdr)); isync(); tlbia(); During the loop there, we are in the middle of programming the MMU segment registers in real mode, and is supposed to be doing all work out of registers. (and powerpc_sync() and isync() should be expanded to their single assembly instruction, not a function call. The whole point of calling those is that we are in an inconsistent hardware state and need to sync up before continuing execution) If there isn't a way to force inlining, we will have to change to using preprocessor macros in cpufunc.h. Comment Actions There is, it's called __attribute__((always_inline)) and supported by both GCC and Clang. But at -O0 you'll still have register allocation to deal with, so really that code is just fundamentally broken and should not be written in C. There is no way for you to guarantee stack spills are not used, it's way out of scope for C. Comment Actions Actually, this is probably a bad example. Since we're in real mode it doesn't really matter. But I can see other places where powerpc_sync() / isync() are dangerous to expand to a function call. Comment Actions Is there a way to have always_inline and unused at the same time? I tried using always_inline and it caused warnings in things that used *parts* of cpufunc.h. Comment Actions Both __attribute__((always_inline)) __attribute__((unused)) and __attribute__((always_inline, unused)) work, but really you should use __always_inline __unused in FreeBSD (which will expand to the former). Comment Actions But also you really should not get warnings for unused static functions in included headers, only ones defined in the C source file itself. We'd have countless warnings in the kernel across all architectures otherwise. Comment Actions I agree. But that's what it is doing when using always_inline in combination with -Wunused-function. There is currently no real usage of always_inline in system headers though, so maybe I'm just the first to complain about it? Comment Actions We use them in CheriBSD and have no such issues that I've ever noticed. When was the last time you checked (and what compiler)? Comment Actions Five minutes ago, FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-rc2-0-g414f32a9e86) Using always_inline unused appears to work to silence the warnings, however, so that is probably workable. Comment Actions Several previous comments are FreeBSD specific. To we clang developers, the concrete request is
right? I think this makes sense, especially when inline is explicitly specified... This appears to be related to some -O1 work @echristo is working on. // gcc -O1 and g++ -O1 inline `foo`. Note that in C99 mode, `extern int foo` is needed to ask the compiler to provide an external definition. // clang -O1 and clang++ -O1 do not inline `foo` inline int foo(int a) { return a + a; } int bar(int a, int b) { return foo(a + b); } Comment Actions Yes, inline should certainly be inlined, but also non-inline things should too. Perhaps not so aggressively, but there's no good reason not to, really, it can be a big win with a simple transformation. GCC seems to inline even those: # echo 'static void foo(void) { __asm__ __volatile__ ("#asdf"); } void bar(void) { foo(); } void baz(void) { foo(); foo(); }' | gcc -x c - -o - -S -O1 .file "" .text .globl bar .type bar, @function bar: .LFB1: .cfi_startproc #APP # 1 "<stdin>" 1 #asdf # 0 "" 2 #NO_APP ret .cfi_endproc .LFE1: .size bar, .-bar .globl baz .type baz, @function baz: .LFB2: .cfi_startproc #APP # 1 "<stdin>" 1 #asdf # 0 "" 2 # 1 "<stdin>" 1 #asdf # 0 "" 2 #NO_APP ret .cfi_endproc .LFE2: .size baz, .-baz .ident "GCC: (Debian 10.2.0-8) 10.2.0" .section .note.GNU-stack,"",@progbits |