This is an archive of the discontinued LLVM Phabricator instance.

[ubsan] Minimize size of data for type_mismatch
AbandonedPublic

Authored by filcab on Apr 28 2016, 9:41 AM.

Details

Summary

This is the compiler-rt side of D19667.

Diff Detail

Event Timeline

filcab updated this revision to Diff 55429.Apr 28 2016, 9:41 AM
filcab retitled this revision from to [ubsan] Minimize size of data for type_mismatch.
filcab updated this object.
filcab added reviewers: kcc, samsonov, rsmith.
filcab added a subscriber: llvm-commits.
filcab added a subscriber: filcab.Apr 28 2016, 9:45 AM

We can think about adding a piece of data (we have 6 bytes which will
end up being used for padding) that tells us we have a "V2" struct
layout.

If we feel strongly about being able to use older objects, then we can
add an unsigned char after TypeCheckKind, for example. Since
TypeCheckKind can be 0, we have no reliable way to figure out if we're
looking at a new layout (we could see only one bit set in both chars.
This could be because we were looking at an old-style alignment
(uptr), or a new style (char) plus TCK set to 0).

Kostya: Any thoughts?

Thank you,

Filipe
bcraig added a subscriber: bcraig.Apr 28 2016, 11:23 AM

If we want to be able to reuse old objects, then you could put a Version member between Alignment and TypeCheckKind.

In the old version, alignment always stored a power of two. That means that only a single bit was set in the four bytes available. If you put something with more than one bit in any one of those bytes, then we can tell that it isn't a version zero struct.

//exposition only
#define VERSION_ZERO 0x00
#define VERSION_ONE 0x80 | 0x01
#define VERSION_TWO 0x80 | 0x02

struct TypeMismatchData {
  SourceLocation Loc;
  const TypeDescriptor &Type;
  unsigned char Alignment;
  unsigned char Version = VERSION_ONE;
  unsigned char TypeCheckKind;
};

Hi Ben,

No need to OR with 0x80. If we have two bits set in the first two
bytes, then we know it's (at least) V2 (we can set Alignment to 1 when
we've been given 0, and have one bit from there).
I was probing to know if it's an actual use-case people have been
using UBSan with (which makes it very desirable), or if we simply do
not care that much. If people have seen this in the wild (I'm ok with
someone saying they've been using it that way internally, too), then
it's obvious we want to try to keep compatibility.

Thank you,

Filipe
filcab updated this revision to Diff 57352.May 16 2016, 8:24 AM

Added backwards compatibility.

kcc edited edge metadata.May 26 2016, 9:51 AM

Sorry, I missed this CL.
Could you please repeat why we need backwards compatibility?

I would rather prefer to not have this complexity, but instead make sure we don't mix old API and the new one,
e.g. the same way we do it in asan. (see asan/asan_init_version.h)

In D19668#441079, @kcc wrote:

Sorry, I missed this CL.
Could you please repeat why we need backwards compatibility?

I would rather prefer to not have this complexity, but instead make sure we don't mix old API and the new one,
e.g. the same way we do it in asan. (see asan/asan_init_version.h)

The backwards compatibility was to be able to mix older files, like when we upgraded the float cast overflow to be able to deal with having the source location or not.

From your comment it seems you'd prefer one of:

  • Just use the newer and users have to match versions of clang with the ubsan runtime (I think this is acceptable, but I might be an edge case ;-) )
  • Create __ubsan_handle_type_mismatch2 and __ubsan_handle_type_mismatch2_abort (or similar) and use the new struct with that (no need to add a Version field if we do it this way, but we'll have functions that might just start to bitrot)

I think doing ABI versions the way ASan does them doesn't really work that well for UBSan, since the checks are mostly spot checks, which don't depend on each other. I think the best it to either don't version it and say users are responsible for matching revs, or have versions per check.

Which one would you prefer (I have mild preference for one, but no strong opinion)?

kcc added a comment.May 26 2016, 11:28 AM
In D19668#441079, @kcc wrote:

Sorry, I missed this CL.
Could you please repeat why we need backwards compatibility?

I would rather prefer to not have this complexity, but instead make sure we don't mix old API and the new one,
e.g. the same way we do it in asan. (see asan/asan_init_version.h)

The backwards compatibility was to be able to mix older files, like when we upgraded the float cast overflow to be able to deal with having the source location or not.

From your comment it seems you'd prefer one of:

  • Just use the newer and users have to match versions of clang with the ubsan runtime (I think this is acceptable, but I might be an edge case ;-) )
  • Create __ubsan_handle_type_mismatch2 and __ubsan_handle_type_mismatch2_abort (or similar) and use the new struct with that (no need to add a Version field if we do it this way, but we'll have functions that might just start to bitrot)

I think doing ABI versions the way ASan does them doesn't really work that well for UBSan, since the checks are mostly spot checks, which don't depend on each other. I think the best it to either don't version it and say users are responsible for matching revs, or have versions per check.

I don't want to have multiple versions in the code base as it makes code maintenance more expensive.
Making "users are responsible for X" work only if the users know when the break X.
Will they know it in this case, or things will break silently?

Which one would you prefer (I have mild preference for one, but no strong opinion)?

Things will break loudly:

Newer object file with older lib:
Out of bounds access guaranteed (strict became smaller), TypeCheckKind will
be wrong, as well as Alignment.
I don't expect users to use an older runtime with a newer clang to happen
much.

Old objects with newer lib:
Those fields will be wrong too.

If we have the Version field (same struct size as without it), we can
differentiate, and we can make old objects on the newer lib still work
(same thing we did for FloatCastOverflow).

I expect lots of false positives with the first two, which should make the
users double-check their assumptions. But might end up making people spend
a good chunk of time debugging until they find it out.

vsk added a subscriber: vsk.Nov 29 2016, 11:32 AM
filcab abandoned this revision.Dec 15 2016, 7:28 AM

I will post a new patch, using the features from rL289444.