LLVM part of the two part patch (LLVM + clang)
clang part: https://reviews.llvm.org/D47600
An experiment: implement precise UBSan checks on 32-bit ARM.
The current UBSan's "one trap per function" approach has a few issues:
- the report doesn't tell us the exact offending instruction since all conditional branches lead to the same trap instruction
- trap instruction costs some code size, even though it's never executed
- *MAYBE* the branch predictor is polluted by lots of un-taken branches
The idea is to inject a "conditional trap" instruction (does not exist
in the current instruction set) into each potential failure point to get
the precise reporting and avoid polluting the branch predictor.
To simulate it on the current instruction set, SVCxx 0xFFFFFF was chosen,
where "xx" is a predicate (overflow etc.), so the UBSan-instrumented code
changes from this:
adds r0, r0, r1 bvc L udf #65006
L: bx lr
to this:
adds r0, r0, r1 svcvs 0x00ffffff bx lr
Two UBSan heavy projects were used for benchmarking, bzip2
(http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz) and 464.h264ref from
SPEC_CPU2006v1.2 package, comparing two instrumented versions, one
with the current UBSan implementation and another with the simulated
conditional trap.
Measurements show that code size increased for both projects:
- bzip2 - ~3%
- h264ref - ~5%
The size increase might be attributed to more precise checks, for example,
a + b + c with overflow check:
adds r0, r0, r1 addsvc r0, r0, r2 bvc trap
now looks like this:
adds r0, r0, r1 svcvs 0x00ffffff adds r0, r0, r2 svcvs 0x00ffffff
Likely, other optimisations are inhibited by these new checks too.
The performance is also worse both on little and big cores:
- bzip2 - ~108% (little core) and ~415% (big core) of base line
- h264ref - ~117% (little core) and ~340% (big core) of base line
Using other instructions instead of SVCxx (NOPxx, for one example) shows
that there is still the performance hit, although not as dramatic difference
on big core as with SVCxx, but still in a range of 2% - 35% on various
tests and instructions.
The conclusion: using the existing ARM instructions to implement
precise UBSan checks achieves this exact goal, precise error reports,
but seem to be impractical in terms of the code size and performance.