This is an archive of the discontinued LLVM Phabricator instance.

[llvm-cov] Support multi-source object files for `convert-for-testing`
ClosedPublic

Authored by AtomicGu on Jul 30 2023, 5:42 AM.

Details

Summary

llvm-cov convert-for-testing only functions properly when the input binary contains a single source file. When the binary has multiple source files, a Malformed coverage data error will occur when the generated .covmapping is read back. This is because the testing format lacks support for indicating the size of its file records, and current implementation just assumes there's only one record in it. This patch fixes this problem by introducing a new testing format version.

Changes to the code:

  • Add a new format version. The version number is stored in the the last 8 bytes of the orignial magic number field to be backward-compatible.
  • Output a LEB128 number before the file records section to indicate its size in the new version.
  • Change the format parsing code correspondingly.
  • Update the document to formalize the testing format.
  • Additionally, fix the bug when converting COFF binaries.

Diff Detail

Event Timeline

AtomicGu created this revision.Jul 30 2023, 5:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2023, 5:42 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
AtomicGu requested review of this revision.Jul 30 2023, 5:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2023, 5:42 AM
AtomicGu edited the summary of this revision. (Show Details)Jul 30 2023, 5:42 AM
phosek added a comment.Aug 2 2023, 4:49 PM

This is a breaking change since it modifies the .covmapping format in a backwards incompatible way. For .profdata format we always try to avoid breaking changes and provide backwards compatibility, but I don't know if the same holds for the .covmapping format which is used only for testing. @davidxl do you have any thoughts on that?

llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
908–910

It looks like in the current implementation, we're using getFilenameSize to compute the CoverageMappingSize, but CovMapHeader also contains CoverageMappingSize, accessible through getCoverageSize(). Shouldn't we be using that instead? Isn't that the issue with the current implementation?

921–922

Can you preserve this check in the new implementation?

AtomicGu marked 2 inline comments as done.Aug 4 2023, 6:27 PM
AtomicGu added inline comments.
llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
908–910

I have inspected this variable. The value is not right. It is always zero, as the documentation says:

The length of the string in the third field of __llvm_coverage_mapping that contains any encoded coverage mapping data affixed to the coverage header. Always 0, but present for backwards compatibility.

AtomicGu updated this revision to Diff 547432.Aug 4 2023, 6:34 PM
AtomicGu marked an inline comment as done.
AtomicGu edited the summary of this revision. (Show Details)
  • [llvm-cov] Add two checks to readCoverageMappingData.
AtomicGu updated this revision to Diff 547473.Aug 5 2023, 4:15 AM
  • [llvm-cov] Add truncated checking to loadTestingFormat.
phosek added inline comments.Aug 5 2023, 10:57 AM
llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
908–910

If it's not being used, could we reuse it rather than introducing a new field?

AtomicGu marked an inline comment as done.Aug 5 2023, 7:10 PM
AtomicGu added inline comments.
llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
908–910

There is an old test llvm-project/llvm/test/tools/llvm-cov/Inputs/combine_expansions.covmapping which is Version1. In this case, this variable is used and the value is still not right. If we reuse this field, at least we will lose the compatibility for Version1. However I think we can add a version check and only support the multi-source covmapping after Version4. So... I think this may work!

AtomicGu updated this revision to Diff 547517.Aug 5 2023, 8:26 PM
AtomicGu marked an inline comment as done.
This comment was removed by AtomicGu.
AtomicGu updated this revision to Diff 547518.Aug 5 2023, 8:36 PM
  • [llvm-cov] use getCoverageSize() to store the CoverageMappingSize.
  • [llvm-cov] Update existing covmapping files.
AtomicGu added inline comments.Aug 5 2023, 8:44 PM
llvm/docs/CoverageMappingFormat.rst
327

It seems that the convert-for-testing command of llvm-cov does not show up in any public documentation. So I decide not to mention it here.

AtomicGu updated this revision to Diff 547534.Aug 6 2023, 1:54 AM
This comment was removed by AtomicGu.

@phosek I finished the revision. The x64 debian pre-merge checks passed. However the x64 windows pre-merge checks failed because of timeout. But I think the code is OK because It happened a few times before. Do I need to retry the pre-merge checks?

AtomicGu edited the summary of this revision. (Show Details)Aug 6 2023, 2:04 AM
gulfem added inline comments.Aug 8 2023, 5:52 PM
llvm/tools/llvm-cov/TestingSupport.cpp
142

I don't think this is the right way of fixing the issue. There are two sizes that I want to differentiate:

  1. CoverageMappingSize which is per module in CovMapHeader
  2. TotalCoverageMappingSize, which is the sum of CoverageMappingSize in all modules

CoverageMappingData.size() at line 135 gives you TotalCoverageMappingSize, and this code overwrites the first module's CoverageMappingSize with TotalCoverageMappingSize. This is the first incorrect thing here. Second incorrect thing is that even if we reuse the CoverageMappingSize field in CovMapHeader, this file is not the right place to write it. This code overwrites a value instead of writing the correct value in the first place. This is where CoverageMappingSize is written in the first place in CodeGen.
https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CoverageMappingGen.cpp#L1785

We need to encode TotalCoverageMappingSize to fix this issue, and that's why I suggest using the earlier version that you implemented which encodes CoverageMappingData.size(). I don't think using CoverageMappingSize per module would be helpful here because in order to calculate TotalCoverageMappingSize, you need to read CoverageMappingSize by reading every CovMapHeader here, which we probably do not want to do.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp#L910

gulfem added a comment.Aug 8 2023, 5:59 PM

This is a breaking change since it modifies the .covmapping format in a backwards incompatible way. For .profdata format we always try to avoid breaking changes and provide backwards compatibility, but I don't know if the same holds for the .covmapping format which is used only for testing. @davidxl do you have any thoughts on that?

@davidxl @AtomicGu is doing a GSoC project this summer, and ran into this issue in the test that he wrote in https://reviews.llvm.org/D151283. If you do not have any concern about compatibility of .covmapping format, can we land this change?

AtomicGu marked an inline comment as done.Aug 9 2023, 5:43 PM
AtomicGu added inline comments.
llvm/tools/llvm-cov/TestingSupport.cpp
142

Yes, that's what I thought at first. But @phosek wants the backward compatibility, and this is the only way. So do I need to revert to the previous version? Both version has passed the pre-merge checks.

AtomicGu updated this revision to Diff 549554.Aug 11 2023, 5:35 PM
AtomicGu marked an inline comment as done.

Revert to the previous version.

gulfem accepted this revision.Aug 11 2023, 7:19 PM

Please remove the following in the commit message:

Reuse the CoverageSize field of the first file record to store the section size after Version4.
This revision is now accepted and ready to land.Aug 11 2023, 7:19 PM
AtomicGu edited the summary of this revision. (Show Details)Aug 11 2023, 7:31 PM

@gulfem It's done. I'm waiting the pre-merge checks to be passed.

@gulfem The pre-merge checks have been passed. How to commit this review? I think I don't have the commit access.

@gulfem The pre-merge checks have been passed. How to commit this review? I think I don't have the commit access.

I'll land it for you. You can request a commit access via https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access when you have a good track record.

This is a breaking change since it modifies the .covmapping format in a backwards incompatible way. For .profdata format we always try to avoid breaking changes and provide backwards compatibility, but I don't know if the same holds for the .covmapping format which is used only for testing. @davidxl do you have any thoughts on that?

I was going to suggest bumping the coverage mapping format version if there is a concern about backwards compatibility. But, I don't think this is really changing the coverage mapping format itself. To me, it fixes the issue where we convert objects files to covmapping files, and the way we read coverage via using object files should not be impacted by that.

phosek accepted this revision.Aug 14 2023, 7:45 PM

I still have some concerns, but I think the risk is fairly low, and since no other reviewer responded, I think we can go ahead and land the change. We can always revert the change if it turns out that it broke some downstream users.

dyung added a subscriber: dyung.Aug 14 2023, 11:50 PM

Hi @AtomicGu, we have some internal tests that use covmapping files that started failing after this change. I notice as part of your commit message you stated that you "update(d) existing .covmapping files". How exactly did you do that? I'm guessing that we just need to update our private test files to work with your new format, but not being familiar with this area at all, I'm not sure how to convert our "old" .covmapping file to the "new" format that is expected. Any tips?

AtomicGu added a comment.EditedAug 15 2023, 3:40 AM

Hi @AtomicGu, we have some internal tests that use covmapping files that started failing after this change. I notice as part of your commit message you stated that you "update(d) existing .covmapping files". How exactly did you do that? I'm guessing that we just need to update our private test files to work with your new format, but not being familiar with this area at all, I'm not sure how to convert our "old" .covmapping file to the "new" format that is expected. Any tips?

I'm so sorry to hear about that. I updated those old .covmapping file by a special temporary modified version of llvm-cov. I will upload it at once for your ergency.

I thought this file format is private and shouldn't be used in any project outside LLVM, as it's not mentioned in any public document. Well, it turns out I was wrong. @gulfem @phosek Do you think we need to add a new option to covert-for-testing for potential others to update their .covmapping files?

Maybe we should have changed the magic number of it. So that it can preserve the backward compatibility.

@dyung Here is a patch for converting. Use this patch by git apply. As I said, it's very rudimentary and for temporary use.

To use it, set an environment variable LLVM_COV_NEW_TESTING_FORMAT_CVT with an output path of the new .covmapping file. Then run llvm-cov report on previous .covmapping files. You need to specify the option --instr-profile as well to make llvm-cov start successfully, although it won't affect the output.

For example:

LLVM_COV_NEW_TESTING_FORMAT_CVT=path/to/main.covmapping2 \
llvm-cov report --instr-profile any/profdata/will/do main.covmapping

However, use it cautiously. I don't know whether we will revert this path or not. How do you think? As an actual downstream user, do you think we should keep the backward compatibility, or provide a formal tool (like a new option to convert-for-testing) to convert the old format files?

Please reply to me as soon as possible. I will fix this at once when the decision is made.

AtomicGu reopened this revision.Aug 15 2023, 4:45 AM
This revision is now accepted and ready to land.Aug 15 2023, 4:45 AM
AtomicGu updated this revision to Diff 550263.Aug 15 2023, 4:46 AM
This comment was removed by AtomicGu.
AtomicGu updated this revision to Diff 550269.Aug 15 2023, 4:48 AM
This comment was removed by AtomicGu.
AtomicGu updated this revision to Diff 550271.Aug 15 2023, 4:53 AM

Use another backward-compatible way to fix the bug.

We can keep backward compatibility this time. Let’s revert the previous commit. I hope it’s not too late.
@phosek @gulfem @dyung I don't know how to revert this patch. I don't think I have the privilege to do this.

AtomicGu updated this revision to Diff 550276.Aug 15 2023, 5:13 AM
  • Support COFF binaries for convert-for-testing.
AtomicGu updated this revision to Diff 550340.Aug 15 2023, 8:16 AM
  • Now it can pass the tests.

@dyung Here is a patch for converting. Use this patch by git apply. As I said, it's very rudimentary and for temporary use.

To use it, set an environment variable LLVM_COV_NEW_TESTING_FORMAT_CVT with an output path of the new .covmapping file. Then run llvm-cov report on previous .covmapping files. You need to specify the option --instr-profile as well to make llvm-cov start successfully, although it won't affect the output.

For example:

LLVM_COV_NEW_TESTING_FORMAT_CVT=path/to/main.covmapping2 \
llvm-cov report --instr-profile any/profdata/will/do main.covmapping

However, use it cautiously. I don't know whether we will revert this path or not. How do you think? As an actual downstream user, do you think we should keep the backward compatibility, or provide a formal tool (like a new option to convert-for-testing) to convert the old format files?

Please reply to me as soon as possible. I will fix this at once when the decision is made.

Thanks for this @AtomicGu, I'll give that a try. We have private changes downstream that when it was implemented a test was added which used .covmapping files and that is what I need to update. It's not really something we ship to customers, so really all I am doing is what you did here, updating the tests.

AtomicGu retitled this revision from [llvm-cov] Fix a bug about using `convert-for-testing` on multi-source object files to [llvm-cov] Support multi-source object files for `convert-for-testing`.Aug 15 2023, 9:16 PM
AtomicGu edited the summary of this revision. (Show Details)

@gulfem All the tests have been passed. But this time we change no existing .covmapping files. I think we can land this patch with confidence now. I haven't been granted the commit access yet. So I need your help to do this.

AtomicGu updated this revision to Diff 550610.Aug 15 2023, 10:13 PM
AtomicGu edited the summary of this revision. (Show Details)

If we're going to change the test data format to encode a version, which I think is the right direction, I'd like to do it in a more systematic way. Specifically, I think we should change the magic to be just "llvmcovm" (which is 8 characters so it could be represented as uint64_t) followed by a uint64_t used for the version. We could then treat "testdata" (8387236824784925793 as uint64_t) as one of the recognized versions (ideally it would be interchangeable with 1).

AtomicGu updated this revision to Diff 550709.Aug 16 2023, 5:46 AM
  • Formalize the testing format of llvm-cov.
AtomicGu edited the summary of this revision. (Show Details)Aug 16 2023, 5:50 AM

@phosek Hi. I took your advice and reimplemented the new format. I added a new class for writing testing format and introduced the format in the document to make it more formal for the later maintainers. I think this is the best we can do now.

I added a few nit-picks. Other than that, looks good to me.

llvm/docs/CoverageMappingFormat.rst
620–621

just say for testing purposes and remove the part about including binary in the repository.

657

which is explained later.

686

Just say Version 2 is the current version, and please remove the rest (no need to explain why Version 1 is kept, etc.).

llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
986

Some code is duplicated between the case statements, and each case statement is long to read. I think for checking versions if might be a better fit than switch because there might be several versions later. So, I would suggest this: when the TestingFormatVersion is bigger than Version1, use decodeULEB128 to read the size. The rest of the code can mostly be shared up to this point:

uint32_t FilenamesSize =
CovHeader->getFilenamesSize<support::endianness::little>();
CoverageMappingSize = sizeof(CovMapHeader) + FilenamesSize;

You can put the code above in a if statement that checks TestingFormatVersion, and I think the rest can be shared.

gulfem added inline comments.Aug 16 2023, 9:32 PM
llvm/docs/CoverageMappingFormat.rst
630

I think it will useful to document convert-for-testing in llvm-cov command guide documentation, and remove "You can't see it in any other public documents or in the --help output of llvm-cov". The details about covmapping format can stay here.

AtomicGu marked 5 inline comments as done.Aug 17 2023, 1:04 AM
AtomicGu updated this revision to Diff 551034.Aug 17 2023, 1:05 AM
  • Took advice from gulfem.
AtomicGu edited the summary of this revision. (Show Details)Aug 17 2023, 1:17 AM

@gulfem @phosek Hi. The patch passed all tests. It's ready for landing now.

gulfem accepted this revision.Aug 17 2023, 9:51 AM
gulfem added inline comments.
llvm/docs/CommandGuide/llvm-cov.rst
569

I think there is no need for this example, so you can remove this code-block.

llvm/docs/CoverageMappingFormat.rst
621

for testing purposes.

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
273

You can turn this switch into an if, as the first case statement is not doing anything.

AtomicGu marked 3 inline comments as done.Aug 18 2023, 5:15 AM
AtomicGu updated this revision to Diff 551468.Aug 18 2023, 5:17 AM
  • Took advice from gulfem.