Page MenuHomePhabricator

[compiler-rt][profile] Fix various InstrProf tests on Solaris
ClosedPublic

Authored by ro on Aug 3 2020, 3:26 AM.

Details

Summary

Currently, several InstrProf tests FAIL on Solaris (both sparc and x86):

Profile-i386 :: Posix/instrprof-visibility.cpp
Profile-i386 :: instrprof-merging.cpp
Profile-i386 :: instrprof-set-file-object-merging.c
Profile-i386 :: instrprof-set-file-object.c

On sparc there's also

Profile-sparc :: coverage_comments.cpp

The failure mode is always the same:

error: /var/llvm/local-amd64/projects/compiler-rt/test/profile/Profile-i386/Posix/Output/instrprof-visibility.cpp.tmp: Failed to load coverage: Malformed coverage data

The error is from llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp (loadBinaryFormat), l.926:

InstrProfSymtab ProfileNames;
std::vector<SectionRef> NamesSectionRefs = *NamesSection;
if (NamesSectionRefs.size() != 1)
  return make_error<CoverageMapError>(coveragemap_error::malformed);

where .size() is 2 instead.

Looking at the executable, I find (with elfdump -c -N __llvm_prf_names):

Section Header[15]:  sh_name: __llvm_prf_names
    sh_addr:      0x8053ca5       sh_flags:   [ SHF_ALLOC ]
    sh_size:      0x86            sh_type:    [ SHT_PROGBITS ]
    sh_offset:    0x3ca5          sh_entsize: 0
    sh_link:      0               sh_info:    0
    sh_addralign: 0x1       

Section Header[31]:  sh_name: __llvm_prf_names
    sh_addr:      0x8069998       sh_flags:   [ SHF_WRITE SHF_ALLOC ]
    sh_size:      0               sh_type:    [ SHT_PROGBITS ]
    sh_offset:    0x9998          sh_entsize: 0
    sh_link:      0               sh_info:    0
    sh_addralign: 0x1

Unlike GNU ld (which primarily operates on section names) the Solaris linker, following the ELF spirit, only merges input sections into an output section if both section name and section flags match, so two separate sections are maintained.

The read-write one comes from lib/clang/12.0.0/lib/sunos/libclang_rt.profile-i386.a(InstrProfilingPlatformLinux.c.o) while the read-only one is generated by llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp

(`InstrProfiling::emitNameData`) at l.1004 where `isConstant = true`.

The easiest way to avoid the mismatch is to change the definition in compiler-rt/lib/profile/InstrProfilingPlatformLinux.c to const.

This fixes all failures observed.

Tested on amd64-pc-solaris2.11, sparcv9-sun-solaris2.11, and x86_64-pc-linux-gnu.

Diff Detail

Event Timeline

ro created this revision.Aug 3 2020, 3:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2020, 3:26 AM
Herald added subscribers: Restricted Project, fedor.sergeev, dberris, jyknight. · View Herald Transcript
ro requested review of this revision.Aug 3 2020, 3:26 AM
davidxl accepted this revision.Aug 3 2020, 9:58 AM

Looks good to me.

This revision is now accepted and ready to land.Aug 3 2020, 9:58 AM
This revision was landed with ongoing or failed builds.Aug 3 2020, 10:56 AM
This revision was automatically updated to reflect the committed changes.

We are seeing regressions on Linux, from this. See https://github.com/google/oss-fuzz/issues/4348. Can you please revert as it breaks coverage on several linux binaries.

ro added a comment.Aug 24 2020, 2:51 AM

We are seeing regressions on Linux, from this. See https://github.com/google/oss-fuzz/issues/4348. Can you please revert as it breaks coverage on several linux binaries.

Can do, however I'd first try if the issue can be fixed quickly instead of trading breakage on one platform for breakage on another.

First, how did you verify that this patch is the culprit? While it seems plausible, I've seen too many cases where a likely candidate turned out to be innocent, wasting everyone's time.

Can you please also provide lots of additional information:

  • Which platform are you seeing this error on?
  • Which revision of clang were you using? Any local changes? How exactly was cmake invoked?
  • Which linker do you use?
  • Can you provide the output of readelf -SW for the binary that produces the error, the version of libclang_rt.profile-*.alinked, and if possible for the input objects that have __llvm_prf_names sections?

First, how did you verify that this patch is the culprit? While it seems plausible, I've seen too many cases where a likely candidate turned out to be innocent, wasting everyone's time.

When chromium folks were rolling clang, it was the only suspect in code coverage changes - llvmorg-12-init-1771-g1bd7046e : llvmorg-12-init-3492-ga1caa302 [like git hash, these are tags, git checkout to them work ]

Can you please also provide lots of additional information:

Which platform are you seeing this error on?

Seen on linux. See red ones on https://oss-fuzz-build-logs.storage.googleapis.com/index.html#curl [for now, i had to hardcode a older clang commit]

Which revision of clang were you using? Any local changes? How exactly was cmake invoked?

https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-clang/checkout_build_install_llvm.sh

Which linker do you use?

System linker on ubuntu 16.04 inside docker. GNU ld (GNU Binutils for Ubuntu) 2.26.1

Can you provide the output of readelf -SW for the binary that produces the error, the version of libclang_rt.profile-*.alinked, and if possible for the input objects that have __llvm_prf_names sections?

libclang_rt.profile-x86_64.a

readelf -SW curl_fuzzer_ldap

There are 48 section headers, starting at offset 0xdc56e8:

Section Headers:

[Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
[ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
[ 1] .interp           PROGBITS        0000000000400270 000270 00001c 00   A  0   0  1
[ 2] .note.ABI-tag     NOTE            000000000040028c 00028c 000020 00   A  0   0  4
[ 3] .gnu.hash         GNU_HASH        00000000004002b0 0002b0 0003d8 00   A  4   0  8
[ 4] .dynsym           DYNSYM          0000000000400688 000688 0026a0 18   A  5   1  8
[ 5] .dynstr           STRTAB          0000000000402d28 002d28 001a18 00   A  0   0  1
[ 6] .gnu.version      VERSYM          0000000000404740 004740 000338 02   A  4   0  2
[ 7] .gnu.version_r    VERNEED         0000000000404a78 004a78 000180 00   A  5   7  8
[ 8] .rela.dyn         RELA            0000000000404bf8 004bf8 0003f0 18   A  4   0  8
[ 9] .rela.plt         RELA            0000000000404fe8 004fe8 001980 18  AI  4  28  8
[10] .init             PROGBITS        0000000000406968 006968 00001f 00  AX  0   0  4
[11] .plt              PROGBITS        0000000000406990 006990 001110 10  AX  0   0 16
[12] .plt.got          PROGBITS        0000000000407aa0 007aa0 000068 00  AX  0   0  8
[13] .text             PROGBITS        0000000000407c00 007c00 30c58f 00  AX  0   0 256
[14] .fini             PROGBITS        0000000000714190 314190 000009 00  AX  0   0  4
[15] .rodata           PROGBITS        0000000000715000 315000 0adb73 00   A  0   0 4096
[16] __llvm_prf_names  PROGBITS        00000000007c2b73 3c2b73 02ac61 00   A  0   0  1
[17] .eh_frame_hdr     PROGBITS        00000000007ed7d4 3ed7d4 01383c 00   A  0   0  4
[18] .eh_frame         PROGBITS        0000000000801010 401010 0656e4 00   A  0   0  8
[19] .gcc_except_table PROGBITS        00000000008666f4 4666f4 0003a8 00   A  0   0  4
[20] .tbss             NOBITS          0000000000a67590 467590 000038 00 WAT  0   0  8
[21] .preinit_array    PREINIT_ARRAY   0000000000a67590 467590 000010 00  WA  0   0  8
[22] .init_array       INIT_ARRAY      0000000000a675a0 4675a0 000030 00  WA  0   0  8
[23] .fini_array       FINI_ARRAY      0000000000a675d0 4675d0 000008 00  WA  0   0  8
[24] .jcr              PROGBITS        0000000000a675d8 4675d8 000008 00  WA  0   0  8
[25] .data.rel.ro      PROGBITS        0000000000a675e0 4675e0 00a698 00  WA  0   0 16
[26] .dynamic          DYNAMIC         0000000000a71c78 471c78 000250 10  WA  5   0  8
[27] .got              PROGBITS        0000000000a71ec8 471ec8 000138 08  WA  0   0  8
[28] .got.plt          PROGBITS        0000000000a72000 472000 000898 08  WA  0   0  8
[29] .data             PROGBITS        0000000000a728a0 4728a0 014cd8 00  WA  0   0 16
[30] __llvm_prf_cnts   PROGBITS        0000000000a87578 487578 0766c0 00  WA  0   0  8
[31] __llvm_prf_data   PROGBITS        0000000000afdc38 4fdc38 057f60 00  WA  0   0  8
[32] __llvm_prf_vnds   PROGBITS        0000000000b55ba0 555ba0 006000 00  WA  0   0 16
[33] .bss              NOBITS          0000000000b5bc00 55bba0 94cf10 00  WA  0   0 512
[34] .comment          PROGBITS        0000000000000000 55bba0 00009e 01  MS  0   0  1
[35] .deplibs          LOOS+fff4c04    0000000000000000 55bc3e 00000b 01  MS  0   0  1
[36] __llvm_covfun     PROGBITS        0000000000000000 55bc50 252b95 00      0   0  8
[37] __llvm_covmap     PROGBITS        0000000000000000 7ae7e8 01b6e0 00      0   0  8
[38] .debug_aranges    PROGBITS        0000000000000000 7c9ec8 0004f0 00      0   0  1
[39] .debug_info       PROGBITS        0000000000000000 7ca3b8 245d9a 00      0   0  1
[40] .debug_abbrev     PROGBITS        0000000000000000 a10152 019e15 00      0   0  1
[41] .debug_line       PROGBITS        0000000000000000 a29f67 163fa0 00      0   0  1
[42] .debug_str        PROGBITS        0000000000000000 b8df07 0371eb 01  MS  0   0  1
[43] .debug_loc        PROGBITS        0000000000000000 bc50f2 085fcd 00      0   0  1
[44] .debug_ranges     PROGBITS        0000000000000000 c4b0bf 056450 00      0   0  1
[45] .shstrtab         STRTAB          0000000000000000 dc54fc 0001ec 00      0   0  1
[46] .symtab           SYMTAB          0000000000000000 ca1510 084ae0 18     47 7588  8
[47] .strtab           STRTAB          0000000000000000 d25ff0 09f50c 00      0   0  1

Key to Flags:

W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)

I dont know how to get __llvm_prf_names section

ro added a comment.Aug 24 2020, 11:29 AM

First, how did you verify that this patch is the culprit? While it seems plausible, I've seen too many cases where a likely candidate turned out to be innocent, wasting everyone's time.

When chromium folks were rolling clang, it was the only suspect in code coverage changes - llvmorg-12-init-1771-g1bd7046e : llvmorg-12-init-3492-ga1caa302 [like git hash, these are tags, git checkout to them work ]

That's exactly what I said: "suspect ... changes": relying on one's suspicion can be totally misleading (been there, done that). Instead, please start with the failing version of llvm, reproduce the failure, backout just my patch (it's only a single line after all), rebuild and retry.

Can you please also provide lots of additional information:

Which platform are you seeing this error on?

Seen on linux. See red ones on https://oss-fuzz-build-logs.storage.googleapis.com/index.html#curl [for now, i had to hardcode a older clang commit]

Linux/x86_64 I assume? There's nothing red there right now.

Which revision of clang were you using? Any local changes? How exactly was cmake invoked?

https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-clang/checkout_build_install_llvm.sh

IIUC you're using a modified version of clang. Please reproduce with vanilla upstream llvm first.

Which linker do you use?

System linker on ubuntu 16.04 inside docker. GNU ld (GNU Binutils for Ubuntu) 2.26.1

Can you provide the output of readelf -SW for the binary that produces the error, the version of libclang_rt.profile-*.alinked, and if possible for the input objects that have __llvm_prf_names sections?

libclang_rt.profile-x86_64.a

I meant: please provide the output of readelf -SW *for all of the binary, libclang_rt.profile-*.a, and the input objects.

readelf -SW curl_fuzzer_ldap

There are 48 section headers, starting at offset 0xdc56e8:

Section Headers:

[Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
[ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0

[...]

[16] __llvm_prf_names  PROGBITS        00000000007c2b73 3c2b73 02ac61 00   A  0   0  1

[...]

Key to Flags:

W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)

However, unlike the Solaris case my patch has fixed there's only a single instance of the __llvm_prf_names section. Which means this is a different cause of the "Malformed coverage data". There are 26 different places where this error is generated in llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp. So to proceed (after verifying that the error occurs with vanilla llvm and my patch is indeed the culprit), one needs to identify which of those instances hits us here.

I dont know how to get __llvm_prf_names section

Just run readelf -SW over every object/static library that is linked into the problematic binary.

After the verification I asked, I'll need a simple recipe to reproduce the issue. I hope you'll understand that I won't wade through 14 MB of log file (that doesn't even show the error) to guess what might go wrong and how.