This is an archive of the discontinued LLVM Phabricator instance.

[UBSan] DO NOT COMMIT: precise UBSan checks experiment
AbandonedPublic

Authored by alekseyshl on May 31 2018, 11:50 AM.

Details

Reviewers
javed.absar
Summary

LLVM part of the two part patch (LLVM + clang)

clang part: https://reviews.llvm.org/D47600

An experiment: implement precise UBSan checks on 32-bit ARM.

The current UBSan's "one trap per function" approach has a few issues:

  • the report doesn't tell us the exact offending instruction since all conditional branches lead to the same trap instruction
  • trap instruction costs some code size, even though it's never executed
  • *MAYBE* the branch predictor is polluted by lots of un-taken branches

The idea is to inject a "conditional trap" instruction (does not exist
in the current instruction set) into each potential failure point to get
the precise reporting and avoid polluting the branch predictor.
To simulate it on the current instruction set, SVCxx 0xFFFFFF was chosen,
where "xx" is a predicate (overflow etc.), so the UBSan-instrumented code
changes from this:

adds    r0, r0, r1
bvc     L
udf     #65006

L: bx lr

to this:

adds    r0, r0, r1
svcvs   0x00ffffff
bx      lr

Two UBSan heavy projects were used for benchmarking, bzip2
(http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz) and 464.h264ref from
SPEC_CPU2006v1.2 package, comparing two instrumented versions, one
with the current UBSan implementation and another with the simulated
conditional trap.

Measurements show that code size increased for both projects:

  • bzip2 - ~3%
  • h264ref - ~5%

The size increase might be attributed to more precise checks, for example,
a + b + c with overflow check:

adds    r0, r0, r1
addsvc  r0, r0, r2
bvc     trap

now looks like this:

adds    r0, r0, r1
svcvs   0x00ffffff
adds    r0, r0, r2
svcvs   0x00ffffff

Likely, other optimisations are inhibited by these new checks too.

The performance is also worse both on little and big cores:

  • bzip2 - ~108% (little core) and ~415% (big core) of base line
  • h264ref - ~117% (little core) and ~340% (big core) of base line

Using other instructions instead of SVCxx (NOPxx, for one example) shows
that there is still the performance hit, although not as dramatic difference
on big core as with SVCxx, but still in a range of 2% - 35% on various
tests and instructions.

The conclusion: using the existing ARM instructions to implement
precise UBSan checks achieves this exact goal, precise error reports,
but seem to be impractical in terms of the code size and performance.

Diff Detail

Event Timeline

alekseyshl created this revision.May 31 2018, 11:50 AM
alekseyshl edited the summary of this revision. (Show Details)May 31 2018, 11:55 AM
alekseyshl removed a reviewer: javed.absar.
alekseyshl removed subscribers: mgorny, kristof.beyls.

No need to review it, Javed. I uploaded it for sharing and history.

No need to review it, Javed. I uploaded it for sharing and history.

I won't. Thanks for sharing though.

alekseyshl abandoned this revision.Jun 5 2018, 10:29 AM

Experimental.