HomePhabricator

[gcov] Delete CC1 option -coverage-no-function-names-in-data

Authored by MaskRay on May 10 2020, 10:50 AM.

Description

[gcov] Delete CC1 option -coverage-no-function-names-in-data

rL144865 incorrectly wrote function names for GCOV_TAG_FUNCTION
(this might be part of the reasons the header says
"We emit files in a corrupt version of GCOV's "gcda" file format").

rL176173 and rL177475 realized the problem and introduced -coverage-no-function-names-in-data
to work around the issue. (However, the description is wrong.
libgcov never writes function names, even before GCC 4.2).

In reality, the linker command line has to look like:

clang --coverage -Xclang -coverage-version='407*' -Xclang -coverage-cfg-checksum -Xclang -coverage-no-function-names-in-data

Failing to pass -coverage-no-function-names-in-data can make gcov 4.7~7
either produce wrong results (for one gcov-4.9 program, I see "No executable lines")
or segfault (gcov-7).
(gcov-8 uses an incompatible format.)

This patch deletes -coverage-no-function-names-in-data and the related
function names support from libclang_rt.profile

Details

Committed
MaskRayMay 10 2020, 12:37 PM
Parents
rGe4c454b065be: [X86] Add a few more shuffles to hasUndefRegUpdate.
Branches
Unknown
Tags
Unknown

Event Timeline

sylvestre.ledru added a subscriber: sylvestre.ledru.

@MaskRay Please don't land such patch without review.

This option is (was) used by some projects (ex: Firefox)
and you didn't documented this change in the clang release notes

MaskRay added a comment.EditedMay 13 2020, 9:00 AM

@MaskRay Please don't land such patch without review.

This option is (was) used by some projects (ex: Firefox)
and you didn't documented this change in the clang release notes

Can you point to me how Firefox uses -Xclang -coverage-no-function-names-in-data? Debian code search does not reveal anything.

In general I don't think these obscure CC1 options need to be kept for compatibility.

gcov generation in LLVM was in a very sad situation. One needed -Xclang -coverage-cfg-checksum -Xclang -coverage-no-function-names-in-data -Xclang -coverage-version='407*' to be compatible with gcov 4.7 and another -Xclang -coverage-exit-block-before-body to be compatible with gcov 4.8~7.

Please also note that both authors of rL144865 and rL176173 are no longer active now.

Can you point to me how Firefox uses -Xclang -coverage-no-function-names-in-data? Debian code search does not reveal anything.

https://searchfox.org/mozilla-central/source/build/moz.configure/toolchain.configure#1820

In general I don't think these obscure CC1 options need to be kept for compatibility.

I am not argue on this, just that such non-minor changes should be reviewed and documented.

Please also note that both authors of rL144865 and rL176173 are no longer active now.

Others like @jfb, @calixte, @kawashima-fj or @marco-c have been contributing to this part of the code and reviewing each others.

This broke GCOV support with LLVM in the Linux kernel. Going to second @sylvestre.ledru 's point about getting code reviewed. Same for https://reviews.llvm.org/rGcdd683b516d147925212724b09ec6fb792a40041.

This broke GCOV support with LLVM in the Linux kernel.

function_name in .gcda was a mistake in early implementation of clang gcov.
It somehow got the -coverage-no-function-names-in-data CC1 option (a CC1 option is an internal option; the user needs to be aware that such option may change semantics or be removed entirely.)
I had to keep instrumenter and libclang_rt.profile updated in a lock-step manner.

In the Linux kernel, kernel/gcov/clang.c provides an alternative implementation to libclang_rt.profile.
It needs updates when libclang_rt.profile interface changes. This is like sanitizer runtime API change - which is rare, but not impossible.
The developers can show courtesy and try avoiding such changes, but they reserve the rights to change.

Going to second @sylvestre.ledru 's point about getting code reviewed. Same for https://reviews.llvm.org/rGcdd683b516d147925212724b09ec6fb792a40041.

Replied on cdd683b516d147925212724b09ec6fb792a40041. I am fairly sure big-endian support was completely broken before my fixes.

In the Linux kernel, kernel/gcov/clang.c provides an alternative implementation to libclang_rt.profile.
It needs updates when libclang_rt.profile interface changes. This is like sanitizer runtime API change - which is rare, but not impossible.
The developers can show courtesy and try avoiding such changes, but they reserve the rights to change.

That's fair; it's hard to know what out of tree consumers might be affected such ABI changes. And if they don't have good coverage/CI, then this goes missed for some time. This is bad, and I'm fixing it in https://github.com/ClangBuiltLinux/continuous-integration2/pull/107.

But future changes to compiler-rt will need to consider ABI impacts to out of tree consumers, both GCOV and the sanitizers.

And there's no excuse for skipping code review. Get your code reviewed, please. That's a tenet to software engineering.

The original instrumentation and dumping tool (llvm-cov gcov) were incompatible with GCC gcov in a number of ways. If you fed its .gcno/.gcda to any gcov version, you would get an upfront segfault. With a few CC1 options, you can barely make gcov 4.9 recognize its output. llvm-cov gcov had incorrect line counting, redundant code and performance issues. There were a number of miscellaneous issues.

So I actually rewrote most part of llvm-cov gcov and made it able to read GCC 3.4~10 and the previous clang fake 4.2 format and did a large revamp on instrumentation code as well. The function name thing was a misunderstanding of the original developer. I could choose to remain compatibility but that would leave code burden in libclang_rt.profile . It is like PGO/sanitizer runtime. The interfaces are reasonably stable but "downstream is on their own" (the policy in practice is probably more polite than this but "on their own" is the important bit).

I thought I left some comments on https://github.com/ClangBuiltLinux/linux/issues about changed gcov interfaces but looks like I did not. Sorry about that.

The original instrumentation and dumping tool (llvm-cov gcov) were incompatible with GCC gcov in a number of ways. If you fed its .gcno/.gcda to any gcov version, you would get an upfront segfault. With a few CC1 options, you can barely make gcov 4.9 recognize its output. llvm-cov gcov had incorrect line counting, redundant code and performance issues. There were a number of miscellaneous issues.

Yes, the kernel has to split the runtime support for GCOV based on toolchain used to build the kernel.

So I actually rewrote most part of llvm-cov gcov and made it able to read GCC 3.4~10 and the previous clang fake 4.2 format and did a large revamp on instrumentation code as well. The function name thing was a misunderstanding of the original developer. I could choose to remain compatibility but that would leave code burden in libclang_rt.profile . It is like PGO/sanitizer runtime. The interfaces are reasonably stable but "downstream is on their own" (the policy in practice is probably more polite than this but "on their own" is the important bit).

The patches are definitely a welcome improvement; they will allow us to lower the overhead (runtime performance and memory usage) with GCOV.

I thought I left some comments on https://github.com/ClangBuiltLinux/linux/issues about changed gcov interfaces but looks like I did not. Sorry about that.

That's ok. It's very tricky to remember to check for ABI related breakages or remember/know that there are out of tree consumers. That's why code review is important, maybe one of the reviewers would have recalled that there might be out of tree consumers that could use a heads up.

The original instrumentation and dumping tool (llvm-cov gcov) were incompatible with GCC gcov in a number of ways. If you fed its .gcno/.gcda to any gcov version, you would get an upfront segfault. With a few CC1 options, you can barely make gcov 4.9 recognize its output. llvm-cov gcov had incorrect line counting, redundant code and performance issues. There were a number of miscellaneous issues.

Yes, the kernel has to split the runtime support for GCOV based on toolchain used to build the kernel.

So I actually rewrote most part of llvm-cov gcov and made it able to read GCC 3.4~10 and the previous clang fake 4.2 format and did a large revamp on instrumentation code as well. The function name thing was a misunderstanding of the original developer. I could choose to remain compatibility but that would leave code burden in libclang_rt.profile . It is like PGO/sanitizer runtime. The interfaces are reasonably stable but "downstream is on their own" (the policy in practice is probably more polite than this but "on their own" is the important bit).

The patches are definitely a welcome improvement; they will allow us to lower the overhead (runtime performance and memory usage) with GCOV.

I thought I left some comments on https://github.com/ClangBuiltLinux/linux/issues about changed gcov interfaces but looks like I did not. Sorry about that.

That's ok. It's very tricky to remember to check for ABI related breakages or remember/know that there are out of tree consumers. That's why code review is important, maybe one of the reviewers would have recalled that there might be out of tree consumers that could use a heads up.

When I made the change I was not aware of an alternative implementation...
GCC gcov is only guaranteed to be compatible within a major version (there are breaking changes in 4.8, 4.9, 8, 9) so our compatibility theory is actually better now that I added gcov_version to the runtime...

From a retrospective view, I still think I should make the changes. (There were various things needing revamp, keeping old compatibility code would be difficult for me to handle)
But I should communicate this to the kernel folks beforehand.

When I made the change I was not aware of an alternative implementation...
GCC gcov is only guaranteed to be compatible within a major version (there are breaking changes in 4.8, 4.9, 8, 9) so our compatibility theory is actually better now that I added gcov_version to the runtime...

Oh, 8 and 9 did, too? I was only aware of 4.8/4.9. The kernel now only support v4.9+. It looks like there's a kernel commit mentioning gcc-10 support, so it seems likely the kernel gcov maintainer is already aware of such changes.

From a retrospective view, I still think I should make the changes. (There were various things needing revamp, keeping old compatibility code would be difficult for me to handle)
But I should communicate this to the kernel folks beforehand.

That would be appreciated, but the point more so is about code review. I always wonder how you get so much great work done; if it's because you're skipping code review, well that's not great. Lately a lot of your changes to LLVM have caused significant burden for me personally, so I'm personally unhappy when I see code review being skipped which then leads to problems, for myself and others. I can't focus on my work if I'm constantly chasing backports for fixes because LLVM is so unstable.

To help with turnaround time, we, at Mozilla, are happy to quickly review your patches

MaskRay added a comment.EditedMar 12 2021, 1:31 PM

When I made the change I was not aware of an alternative implementation...
GCC gcov is only guaranteed to be compatible within a major version (there are breaking changes in 4.8, 4.9, 8, 9) so our compatibility theory is actually better now that I added gcov_version to the runtime...

Oh, 8 and 9 did, too? I was only aware of 4.8/4.9. The kernel now only support v4.9+. It looks like there's a kernel commit mentioning gcc-10 support, so it seems likely the kernel gcov maintainer is already aware of such changes.

They did. gcov actually warns if the version mismatches. If the kernel prints a version (hypothetical) 15 format, gcov-16 will warn. It may or may not parse the .gcda correctly.
I've tried various mismatching cases - it is not rare that some mismatch will cause an upfront sigsegv.

So I think you probably had too high of a compatibility expectation here.

I added gcov_version to this file, so you can see that 4.7 and 9 have clear runtime changes. I don't implement some corner case gcov features so gcov runtime may have more differences across versions.

From a retrospective view, I still think I should make the changes. (There were various things needing revamp, keeping old compatibility code would be difficult for me to handle)
But I should communicate this to the kernel folks beforehand.

That would be appreciated, but the point more so is about code review. I always wonder how you get so much great work done; if it's because you're skipping code review, well that's not great. Lately a lot of your changes to LLVM have caused significant burden for me personally, so I'm personally unhappy when I see code review being skipped which then leads to problems, for myself and others. I can't focus on my work if I'm constantly chasing backports for fixes because LLVM is so unstable.

A respectful long-time contributor told me "Im sure that you have noticed that I often push back hard on changes to the interface and intrinsic behaviour".
About the last point, I think the thing is really that the kernel has relied on many implementation details instead of interface things.

I probably had a difficult argument here because I did change the runtime interface: for big-endian support; for gcov compatibility; getting rid of legacy redundant stuff 'function_name',
if I did not do this, it would leave a code burden in llvm-project. The change would be perfectly fine if this runtime were indeed private to llvm-project.
Now I know that the kernel implements it, but it would be difficult for me to know that beforehand.

Regarding "I can't focus on my work if I'm constantly chasing backports for fixes because LLVM is so unstable."

Sorry, I'd have a different opinion.
Things like R_386_PLT32/R_X86_64_PLT32/_GLOBAL_OFFSET_TABLE_/.weak (changed binding to ...) are good examples that the kernel re-implements a good part of the toolchain and does things very differently.
What is perfectly fine for other consumers (the linkers have supported R_386_PLT32/R_X86_64_PLT32 for a really long time) can cause trouble to the kernel.
Another good example is various objtool diagnostics. objtool does serve a good use, however, it is overly nitpicking on a number of things. I think simplification on various LLVM features can provoke its diagnostics. I don't think back porting can be saved for it.

I've learned this from lessons by being involved in fixing the kernel issues. The respectful contributor told me: "thats part of the problem with code reviews - it can sometimes be challenging to get them, which is why Im not necessarily the biggest fan of them for OSS environments (but inside of a company, I think that they can be valuable) so, I don't blame you for the post-commit review approach. things break, thats a fact of software engineering; the important thing is to accept and acknowledge that and try to be careful and repair things quickly"

But I do hear your complaints. And now I get some experience what kind of changes may be disliked by the kernel even if they are perfectly fine for *every other user*.
OK, I am a ClangBuiltLinux contributor and I am happy to contribute more there so I am willing to perform more tests (the only testing approach I have is boot-utils/boot-qemu.sh -a $arch -k $dir)

But I do hear your complaints. And now I get some experience what kind of changes may be disliked by the kernel even if they are perfectly fine for *every other user*.

Thanks bud.

OK, I am a ClangBuiltLinux contributor and I am happy to contribute more there so I am willing to perform more tests (the only testing approach I have is boot-utils/boot-qemu.sh -a $arch -k $dir)

For most things I'm content with a quick boot test in an emulator/vm, too. Appreciated.

To help with turnaround time, we, at Mozilla, are happy to quickly review your patches

Thanks, I'll record you as who may be a potential reviewer. gcov is not loved by most groups (most use -fprofile-instr-generate -fcoverage-mapping instead; some may use -fsanitize-coverage).
That said, I think most stuff has been done with gcov. So new development may be rare.

That said, I think most stuff has been done with gcov. So new development may be rare.

sure but PLEASE ASK FOR REVIEW for others changes...
Looking at
https://reviews.llvm.org/rG0361e649759f90046f8f261365df488dc6f68342
https://reviews.llvm.org/rG7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e
https://reviews.llvm.org/rGbfbfd83f147f0e02e652f3ae2635dd834c2e6b46
https://reviews.llvm.org/rG35dd6470de847636c212d7e0cd4d7ac2995679cc

and not listed here:
https://reviews.llvm.org/people/revisions/14377/

You recent changes haven't been reviewed and are breaking clang on i386... Making packages on apt.llvm.org broken...
https://bugs.llvm.org/show_bug.cgi?id=49827

Next time, I will have to escalate your permission abuse...
https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

Also, please avoid using terms like madness, awful patch, etc when talking about the work of others...

MaskRay added a comment.EditedApr 3 2021, 2:16 PM

That said, I think most stuff has been done with gcov. So new development may be rare.

sure but PLEASE ASK FOR REVIEW for others changes...
Looking at
https://reviews.llvm.org/rG0361e649759f90046f8f261365df488dc6f68342
https://reviews.llvm.org/rG7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e
https://reviews.llvm.org/rGbfbfd83f147f0e02e652f3ae2635dd834c2e6b46
https://reviews.llvm.org/rG35dd6470de847636c212d7e0cd4d7ac2995679cc

and not listed here:
https://reviews.llvm.org/people/revisions/14377/

You recent changes haven't been reviewed and are breaking clang on i386... Making packages on apt.llvm.org broken...
https://bugs.llvm.org/show_bug.cgi?id=49827

Next time, I will have to escalate your permission abuse...
https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

@sylvestre.ledru
If you think a particular commit has caused regression, you can reply to that commit, and I will try addressing that soon.
Since you made a comment here, I'll reply here, too.

I know I skipped reviews and made you unhappy in two cases:
(1) gcov. I removed internal CC1 options and Firefox happened to depend on the particular -Xclang usage. There is an important point: CC1 options tend to change and any reliance will be brittle.
(2) clang --target=i686-linux-gnu on Debian i386(?)

I apologize for the inconvenience but I feel that some comments here are a bit trigger-happy.
There were extensive discussions about gcov in comments above so I'll not repeat.
I hope you can listen to more context.

For (2), I am a Debian user and from you email addresses you are a Debian dev. I appreciate your work but I also hoped you could read a bit more context before reaching to a conclusion.

If you compile with an older Clang for cross compilation, clang++ --target=aarch64-linux-gnu a.cc, you will be welcomed by this error:

/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../include/c++/10/iostream:38:10: fatal error: 'bits/c++config.h' file not found
#include <bits/c++config.h>
         ^~~~~~~~~~~~~~~~~~

The list of commits were to improve the situation and to made clang/lib/Driver into a more maintainable state.
And I am proud of getting to this state with these commits.
About the fixups: what kind of cross compilation should work does not have an authoritative definition.
I try not to underestimate the complexity so I usually test a wide range of configurations: Debian native gcc/aarch64-linux-gnu-gcc/powerpc64le-linux-gnu-gcc, sometimes Arch Linux gcc/Ubuntu aarch64/Ubuntu powerpc64le, sometimes gcc compiled from glibc-many-glibcs.py.

If you look at Clang today, clang++ --target=aarch64-linux-gnu a.cc works perfectly with Debian {aarch64,powerpc64le,...}-linux-gnu-g++, nearly perfect with Arch Linux aarch64-linux-gnu-g++.
Sure people like you may not notice this and just observe native clang --targeti686-linux-gnu on i386 now no longer works.
(About the Fuchsia and Chromium fixups: it turns out they have somewhat unusual use cases, e.g. compile-only (no link) goma usage without providing lib/x86_64-linux-gnu, i586-linux-gnu-gcc from an old Debian based rootfs. There were cases I wouldn't be able to predict in advance)
In the end I think I've done sufficient testing, and making such changes without review because they should be community consensus, and it would be difficult to find reviewers in this area (that could take months to get all these things done; and getting these review did not necessarily catch the issue beforehand).
My changes weren't too different from people working on other areas doing similar things (e.g. X86 backend/debug info/).
That may be the status quo of engineering. There are cases we tend to test little (e.g. on Debian i386; actually, llvm-project has less testing on 32-bit architectures) and it is difficult ensuring everything never regresses in 100% case.
As I said, I have tested many configurations and I am building more awareness since I have more ideas about Fuchsia/Chromium usage now.

If you give me instructions getting a minimal rootfs without too much setup (e.g. docker, qemu-system-x86_64), I'd be happy to help.
PS: if you search for madness in the commit messages of llvm-project in history, there are fairly many.
When I picked the word, I was referring to the previous state (huge amount of complexity that other contributors can hardly keep up with). I am certainly not making sarcasm on other contributors.

PS: I have a section in my https://maskray.me/blog/2021-03-28-compiler-driver-and-cross-compilation#Debian where I use mostly neutral sentences, and I do think Debian native gcc is not ideal.
Its cross compiler is quite good, though.

sylvestre.ledru added a comment.EditedApr 3 2021, 2:36 PM

Thanks for taking the time to reply

To reproduce:

sudo debootstrap --arch i386 buster buster.i386  
sudo mount -t proc /proc buster.i386/proc
sudo chroot buster.i386 
cd /root/
dget -x https://llvm-jenkins.debian.net/view/Debian%2010%20-%20Buster/job/llvm-toolchain-buster-source/lastSuccessfulBuild/artifact/llvm-toolchain-snapshot_13%7E++20210403102610+b32e76c6d507-1%7Eexp1%7E20210403203324.1412.dsc
dpkg-source -x llvm-toolchain-snapshot_13%7E++20210403102610+b32e76c6d507-1%7Eexp1%7E20210403203324.1412.dsc
cd llvm-toolchain-*
dpkg-buildpackage -nc -b
(install the missing deps)

Full log:
https://llvm-jenkins.debian.net/view/Debian%2010%20-%20Buster/job/llvm-toolchain-buster-binaries/1211/architecture=i386,distribution=buster/console

Will fail with

-- Looking for __atomic_fetch_add_4 in atomic - found
-- Performing Test HAVE_CXX_ATOMICS_WITH_LIB
-- Performing Test HAVE_CXX_ATOMICS_WITH_LIB - Failed
CMake Error at cmake/modules/CheckAtomic.cmake:56 (message):
  Host compiler must support std::atomic!
Call Stack (most recent call first):
  cmake/config-ix.cmake:367 (include)
  CMakeLists.txt:670 (include)

Which is actually just the missing header.
Seems that
/usr/lib/gcc/i686-linux-gnu/8/../../../../include/i386-linux-gnu/c++/8/64
was missing
(I agree that the path isn't ideal but Debian has 20 years history behind it and many of us are not paid for this ...)

Thanks for taking the time to reply

To reproduce:

sudo debootstrap --arch i386 buster buster.i386  
sudo mount -t proc /proc buster.i386/proc
sudo chroot buster.i386 
cd /root/
dget -x https://llvm-jenkins.debian.net/view/Debian%2010%20-%20Buster/job/llvm-toolchain-buster-source/lastSuccessfulBuild/artifact/llvm-toolchain-snapshot_13%7E++20210403102610+b32e76c6d507-1%7Eexp1%7E20210403203324.1412.dsc
dpkg-source -x llvm-toolchain-snapshot_13%7E++20210403102610+b32e76c6d507-1%7Eexp1%7E20210403203324.1412.dsc
cd llvm-toolchain-*
dpkg-buildpackage -nc -b
(install the missing deps)

Full log:
https://llvm-jenkins.debian.net/view/Debian%2010%20-%20Buster/job/llvm-toolchain-buster-binaries/1211/architecture=i386,distribution=buster/console

Will fail with

-- Looking for __atomic_fetch_add_4 in atomic - found
-- Performing Test HAVE_CXX_ATOMICS_WITH_LIB
-- Performing Test HAVE_CXX_ATOMICS_WITH_LIB - Failed
CMake Error at cmake/modules/CheckAtomic.cmake:56 (message):
  Host compiler must support std::atomic!
Call Stack (most recent call first):
  cmake/config-ix.cmake:367 (include)
  CMakeLists.txt:670 (include)

Which is actually just the missing header.
Seems that
/usr/lib/gcc/i686-linux-gnu/8/../../../../include/i386-linux-gnu/c++/8/64
was missing
(I agree that the path isn't ideal but Debian has 20 years history behind it and many of us are not paid for this ...)

My debootstrap command has run for more than 2 hours and there is no sign that it will finish anytime soon.
So I instead downloaded debian-10.9.0-i386-netinst.iso and installed it into a qemu-image created image.
I'll move the comment to the bug but I will explicitly state here that I don't accept your previous blame.
This is a Debian patch which deviates from upstream GCC behavior and it was not tested in clang/test/Driver.
It might happen to work previously.

root@debian-i386:/tmp/c# g++ --version
g++ (Debian 8.3.0-6) 8.3.0
...
g++ a.cc -fsyntax-only -v |& sed -E 's/ -"?[iIL]/\n&/g'
ignoring duplicate directory "/usr/include/i386-linux-gnu/c++/8"
ignoring nonexistent directory "/usr/local/include/i386-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/i686-linux-gnu/8/../../../../i686-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/8
 /usr/include/i386-linux-gnu/c++/8   # the current clang driver tries to use /tmp/d/fake-i386/usr/lib/gcc/i686-linux-gnu/8/../../../../include/i686-linux-gnu/c++/8 while Debian i386 expects ../../../../include/i386-linux-gnu/c++/8  here
 /usr/include/c++/8/backward
 /usr/lib/gcc/i686-linux-gnu/8/include
 /usr/local/include
 /usr/lib/gcc/i686-linux-gnu/8/include-fixed
 /usr/include/i386-linux-gnu
 /usr/include