This is an archive of the discontinued LLVM Phabricator instance.

[llvm-cov] Enforce alignment of function records
ClosedPublic

Authored by serge-sans-paille on Jun 22 2021, 1:56 PM.

Details

Summary

Function Records are required to be aligned on 8 bits. This is enforced for each
records except the first, when one relies on the default alignment within and
std::string. There's no such guarantee, and indeed on 32 bits for some
implementation of std::string this is not enforced.

Provide a portable implementation based on llvm's MemoryBuffer.

Diff Detail

Event Timeline

serge-sans-paille requested review of this revision.Jun 22 2021, 1:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2021, 1:56 PM
vsk added a subscriber: vsk.Jun 22 2021, 4:07 PM

The requirement is that the advancing by sizeof(CovMapFunctionRecordV3) bytes in the record buffer advances to the start of the next record. The address of the buffer doesn't need the same alignment.

Are you seeing an alignment-related crash, and if so, have you tried addressing it by marking CovMapFunctionRecordV3 with LLVM_PACKED?

The requirement is that the advancing by sizeof(CovMapFunctionRecordV3) bytes in the record buffer advances to the start of the next record. The address of the buffer doesn't need the same alignment.

The function advanceByOne(...) starts with the following assert: assert(isAddrAligned(Align(8), this) && "Function record not aligned"); so I tend to think each record needs to be aligned, including the first.

Take the case of the first record created, and assume it has a size multiple of 8. In that case, upon creation no padding is added (independently of it's start address). That's what the snippet below does

while (FuncRecords.size() % 8 != 0)
  FuncRecords += '\0';

Now let's assume the start address is not a multiple of 8. In that case next record won't have an address aligned to 8, etc.

A potential patch would be yo compute padding based on the start address instead of the size, but then when the std::string grows, the alignment of the data may change upon reallocation. Thus my patch.
I also tend to think that using a MemoryBuffer conveys more meaning than an std::string for what is actually a view on raw data.

vsk added a comment.Jun 23 2021, 9:32 AM

The requirement is that the advancing by sizeof(CovMapFunctionRecordV3) bytes in the record buffer advances to the start of the next record. The address of the buffer doesn't need the same alignment.

The function advanceByOne(...) starts with the following assert: assert(isAddrAligned(Align(8), this) && "Function record not aligned"); so I tend to think each record needs to be aligned, including the first.

I see, thanks for highlighting the inconsistency there. It seems like assert(isAddrAligned(Align(8), this) && ...) is overzealous: the important thing is that this points to the start of a function record, *not* that this has some particular alignment. Removing the assert would make sense to me.

Take the case of the first record created, and assume it has a size multiple of 8. In that case, upon creation no padding is added (independently of it's start address). That's what the snippet below does

while (FuncRecords.size() % 8 != 0)
  FuncRecords += '\0';

Now let's assume the start address is not a multiple of 8. In that case next record won't have an address aligned to 8, etc.

Yes, and that's fine. If that runs afoul of alignment rules (and as far as I know it doesn't, since we go through endian::byte_swap to read fields from the struct), marking the function record struct 'packed' should appease the compiler.

Do you feel I'm missing something here, e.g. are you seeing an alignment-related crash in this code?

A potential patch would be yo compute padding based on the start address instead of the size, but then when the std::string grows, the alignment of the data may change upon reallocation. Thus my patch.
I also tend to think that using a MemoryBuffer conveys more meaning than an std::string for what is actually a view on raw data.

I'm completely happy to see refactors/cleanups in this area. I just wanted to make sure we're all on the same page about whether or not there's an alignment bug here. E.g. if there is such a bug that can lead to a crash, it would make sense to add a regression test.

I'm completely happy to see refactors/cleanups in this area. I just wanted to make sure we're all on the same page about whether or not there's an alignment bug here. E.g. if there is such a bug that can lead to a crash, it would make sense to add a regression test.

Actually, the test already exist: current code crashes the test suite on llvm-conv/*branch* tests on RHEL7 on i686 machine. Let me try to explain the bug in a different way:

Assume the buffer allocation starts at address 0x04. Then we have a record of, say, 0x10 bytes. That size is a multiple of 8 so no padding is added, and the second record starts at address 0x14.

Now let's consider the record loading. Loading starts at address 0x04 and that's fine. It loads the expected 0x10 bytes and that's fine too. Then it wants to load the second record, and to do so it computes it's alignment *based on the address*. 0x14 is not aligned on 8, the aligned address for the second record is computed as 0x18, and that's wrong.

Why does this happen? The padding for the record buffer construction is computed based on *size* while the padding for the record loading is computed based on *address*. What my patch does is basically make sure that both abstraction are in sync: size multiple of 8 means address multiple of 8.

vsk accepted this revision.Jun 24 2021, 12:55 PM

I'm completely happy to see refactors/cleanups in this area. I just wanted to make sure we're all on the same page about whether or not there's an alignment bug here. E.g. if there is such a bug that can lead to a crash, it would make sense to add a regression test.

Actually, the test already exist: current code crashes the test suite on llvm-conv/*branch* tests on RHEL7 on i686 machine. Let me try to explain the bug in a different way:

My apologies, I wasn't aware of the breakage. Thanks for taking the time to patiently explain the situation.

Assume the buffer allocation starts at address 0x04. Then we have a record of, say, 0x10 bytes. That size is a multiple of 8 so no padding is added, and the second record starts at address 0x14.

Now let's consider the record loading. Loading starts at address 0x04 and that's fine. It loads the expected 0x10 bytes and that's fine too. Then it wants to load the second record, and to do so it computes it's alignment *based on the address*. 0x14 is not aligned on 8, the aligned address for the second record is computed as 0x18, and that's wrong.
Why does this happen? The padding for the record buffer construction is computed based on *size* while the padding for the record loading is computed based on *address*. What my patch does is basically make sure that both abstraction are in sync: size multiple of 8 means address multiple of 8.

I think I see the issue now:

template <support::endianness Endian>
std::pair<const char *, const CovMapFunctionRecordV3 *>
advanceByOne(const char *) const {
  assert(isAddrAligned(Align(8), this) && "Function record not aligned");
  const char *Next = ((const char *)this) + sizeof(CovMapFunctionRecordV3) -
                     sizeof(char) + getDataSize<Endian>();
  // Each function record has an alignment of 8, so we need to adjust
  // alignment before reading the next record.
  Next += offsetToAlignedAddr(Next, Align(8)); //<< HERE (offset to next aligned record calculated from potentially unaligned address)
  return {nullptr, reinterpret_cast<const CovMapFunctionRecordV3 *>(Next)};
}

The proposed fix lgtm.

This revision is now accepted and ready to land.Jun 24 2021, 12:55 PM
MaskRay accepted this revision.Jun 24 2021, 11:26 PM
MaskRay added inline comments.
llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
1017

worth a comment that this block computes the size taking into account of padding.

1039–1046
MaskRay added a comment.EditedJun 24 2021, 11:26 PM

Function Records are required to be aligned on 8 bits.

8 bytes.

within and std::string

and -> an

This revision was landed with ongoing or failed builds.Jun 25 2021, 1:56 AM
This revision was automatically updated to reflect the committed changes.